How-To

Indexing Experiment - Queries With Parentheses

As indicated in Indexing Experiment - Expressing Queries, I want to allow for more interesting queries with this week's post. Using last week's sample data, I'd like to form expressions, such as

  • Python and PHP
  • Python or PHP
  • (Python not PHP) or Javascript
  • ((Python not PHP) and Buildbot) or Javascript
  • Python and (PHP and AJAX)

In short, last week's example will be expanded to allow usage of parentheses.

Indexing Experiment - Expressing Queries

In last week's post, I started this indexing experiment by creating a simple index based on the words found at specified URLs. Today, I want to look at querying such an index. Using last week's example, my goal is to find all indexed documents containing the words

  • Python and PHP
  • Python or PHP
  • Python not PHP

And so forth. Let's take a look.

Stupidly Easy App Walkthough

One day, the phone rings.
Joe: Hey!! can you make a website for my Candy Wrapper Club?
You: "Candy Wrapper Club?!?"
Joe: "Yes! We collect Candy Wrappers! We just need a little bit about the group, how to contact us and a membership list. It will probably only take you a week. My cousin has a webserver with PHP and MySQL. How about it?
You: <sigh> Sureeeeeeee, I've been wanting to try this MVC thing with PHP.
This is a long one, so grab a cup of coffee!

Indexing Experiment

Not that the discussion of web crawling is over - far from it - but I thought it would be nice to start tinkering with indexing a little bit. This post will bring a very simple example of creating such an index then. The example is intentionally simple to show how easy it is to get started on writing an indexing scheme in Python.

Micro ISV Blogging: Summary

Over the last few weeks I posted a series on Micro ISV blogging, here is a brief summary of the five articles so far.

Stupidly Easy MVC - Directory Structure

There are probably many ways to setup a structure using MVC and this article will talk about two of the ways I have done it and explain why I did it that way and how I think it works out after the fact.

Structure 1
For this project, I used Ruby On Rails as inspiration (Imitation is the best form of flattery, yes?) and used a separate directory for each of the Model, View and Controller. I knew I I was going to have multiple models as well. I used inline php/html for the view for this project since it was only a few forms and we didn't want to do a template system yet. It would be easy to change to templates later of course by just editing the view methods to call a template instead of spitting out html.

Micro ISV Blogging: Useful Extras

After my slightly under-whelmed take on affiliate programs and advertising for Micro ISV blogs last week, I’m really feeling the pressure to include something good this week. Fortunately, there are a couple of really useful services that I haven’t touched on yet in this series, Technorati and Feedburner.

Crawling CodeSnipers

In Detecting Dead Links, I was looking at some simple crawler principles by example of a script that checks a webpage for broken links. Today I want to look at a more dynamic example. It is actually fairly easy to create a crawler that manages to visit a broad range of pages, jumping from site to site, collecting data. In this example however, I want to take specific measures to avoid that. I want to only look at those pages that are accessible within a given set of base URLs.

More Stupidly Easy MVC in PHP

If you've been living under a rock and missed the previous two articles about this EASY 3 class framework, go read Part1 and Part2.

After my initial project where I first created the simple framework I have since used it on 2 other projects. I didn't even use a template solution for one of them, making it even MORE simple. So I've had a chance to really collect my thoughts on this and put it through the ringer. I've had requests for more examples on using this and hopefully this will answer some of your questions.

Extracting Emails

In last week's post I talked about a simple application of web crawling features. This week I want to discuss another application that came to mind, as I was thinking about web crawling.

Let's talk about extracting email addresses from a web page.