Python

Crawling CodeSnipers

In Detecting Dead Links, I was looking at some simple crawler principles by example of a script that checks a webpage for broken links. Today I want to look at a more dynamic example. It is actually fairly easy to create a crawler that manages to visit a broad range of pages, jumping from site to site, collecting data. In this example however, I want to take specific measures to avoid that. I want to only look at those pages that are accessible within a given set of base URLs.

Extracting Emails

In last week's post I talked about a simple application of web crawling features. This week I want to discuss another application that came to mind, as I was thinking about web crawling.

Let's talk about extracting email addresses from a web page.

Detecting Dead Links

I have been spending some of my free time lately with theory and practice of web crawling, searching and so forth. Let's talk about a very quick and easy application: A script to check for dead links on a web site. It's probably easy to come up with various use cases for such a script, so this not only incorporates some simple crawler elements, it also does something useful!

Sharing my cool toys

I was berated the other day by Keith. He told me about PLEAC and I said yeah, I know!! He said no fair you didn't share your cool toys! So for all you remaining coders out there, I'm sharing! Here's a few handy code snippet sites and I'll review for you today.

PLEAC
This site uses the Perl Cookbook as the basis (which has the Perl source freely available) and volunteers rewrite the snippets in other languages where possible. Very handy, if you know one language and wonder how you would do it in another language.