Alex Bendig's blog

Everyday CS?

I expect that a lot of people who spend time reading this or other programming-related blogs are technical minded, tend to have a background in computer programming or computer science. I am curious though: In how far do you apply you CS knowledge to your personal life? Bonus question: What, if no computer is available?

Regarding Schwag

These are exciting times for technology enthusiasts of all types. It appears like a lot of people are in the business of working on something really fun, trying to convince others of the merits of a particular project or is otherwise. Just very recently, Om Malik announced he wants to move on and start his own business. Also, Robert Scoble is leaving Microsoft.

This is of course already old news and there are many other examples. These days, it seems that startups are hot. Everyone is talking about web 2.0, and, the occasional controversy aside, tech firm, especially web 2.0 tech firms are sort of the in thing to do. I am not kidding, there appears to be a certain appeal to the very idea of these firms that I have not witnessed before.

notes from startup school

Startup school was in session at Stanford University a little over a week ago, on Saturday, April 29th. I commuted six hours to check out the event. If you're at all interested in starting your own company or participating in a startup, you will really benefit from this sort of venue.

work, work and life balance

I have been quiet here for the last couple weeks, more a reader than a writer. For me one of the more interesting articles of late was Wife of the mISV - Surviving the Business. When I showed it to my own wife, we shared a laugh and a moment of knowing silence - yeah, this one hits home.

Imagine the following. You're working as a software engineer on some pretty deadline-driven projects. You typically have a lot to do during the regular 9 to 5, enough in fact that you've been known to bring some of it home over the weekend, you know, just in case you got some downtime. You have got a significant other and your combined schedules are such that both of you are really always looking forward to the weekends, when you are both available at least most of the time.

Then, an opportunity turns up. One involving work and circumstances that make it very intriguing for you. The only caveat: The way to make it work is to do both jobs together for a while. Your dayjob (telling yourself, not to bring work home anymore on the weekends) and then the new job, on contract-basis in the evenings and weekends. What would you do? And for how long?

Indexing Experiment - Queries With Parentheses

As indicated in Indexing Experiment - Expressing Queries, I want to allow for more interesting queries with this week's post. Using last week's sample data, I'd like to form expressions, such as

  • Python and PHP
  • Python or PHP
  • (Python not PHP) or Javascript
  • ((Python not PHP) and Buildbot) or Javascript
  • Python and (PHP and AJAX)

In short, last week's example will be expanded to allow usage of parentheses.

Indexing Experiment - Expressing Queries

In last week's post, I started this indexing experiment by creating a simple index based on the words found at specified URLs. Today, I want to look at querying such an index. Using last week's example, my goal is to find all indexed documents containing the words

  • Python and PHP
  • Python or PHP
  • Python not PHP

And so forth. Let's take a look.

Indexing Experiment

Not that the discussion of web crawling is over - far from it - but I thought it would be nice to start tinkering with indexing a little bit. This post will bring a very simple example of creating such an index then. The example is intentionally simple to show how easy it is to get started on writing an indexing scheme in Python.

Crawling CodeSnipers

In Detecting Dead Links, I was looking at some simple crawler principles by example of a script that checks a webpage for broken links. Today I want to look at a more dynamic example. It is actually fairly easy to create a crawler that manages to visit a broad range of pages, jumping from site to site, collecting data. In this example however, I want to take specific measures to avoid that. I want to only look at those pages that are accessible within a given set of base URLs.

Extracting Emails

In last week's post I talked about a simple application of web crawling features. This week I want to discuss another application that came to mind, as I was thinking about web crawling.

Let's talk about extracting email addresses from a web page.

Detecting Dead Links

I have been spending some of my free time lately with theory and practice of web crawling, searching and so forth. Let's talk about a very quick and easy application: A script to check for dead links on a web site. It's probably easy to come up with various use cases for such a script, so this not only incorporates some simple crawler elements, it also does something useful!