Building Clean URLs Into a Site

I wrote about building a site with clean URLs, but that's useless to you. No, you've got a creaking hulking monster of a site that coughs up URLs like "render.php?action=list_mailbox&id=42189", was built "to meet an accelerated schedule", and eats summer interns whole.

This article tells you how to put clean and human-usable URLs on top of the site without even editing your underlying scripts. All these examples mention PHP but it doesn't matter what you coded the site in, you just have to be running Apache and have a little familiarity with regular expressions.

So we have two goals. First, requests for the new URL are internally rewritten to call the existing scripts without users ever knowing they exist. Second, requests for the old URLs get a 301 redirect to the new URLs so that search engines and good bookmarks immediately switch to the new URLs.

Let's work through an example .htaccess file. We take apart the new URLs and map them internally to the old URLs:

RewriteEngine on

RewriteRule ^new/(.*)/(.*)$ /old.php?action=$1&id=$2 [L]

This works great, so we dive into the 301 redirects:

RewriteEngine On

RewriteRule ^new/(.*)/(.*)$ /old.php?action=$1&id=$2 [L]
RewriteCond %{QUERY_STRING} ^action=([a-z]+)&id=([0-9]+)
RewriteRule ^old\.php$ /new/%1/%2? [R=301,L]

Arrrgh! We test this and find a problem: all requests for new are getting 301 redirected back to new. Apache is rewriting new to old fine, but then it sends the new URL back through mod_rewrite again so we're stuck in an infinite loop of redirects (even though the [L] option is supposed to tell Apache to stop applying rules). We need to turn it up to 11 and tell Apache "No, really, stop rewriting URLs now" by setting a flag that it already rewrote the URL.

RewriteEngine on
# This rule just keeps DirectoryIndex working (so requests for / go to /index.php or whatever)
RewriteRule ^$ - [L]
# Set an environment variable REWROTE that we haven't done any rewriting
RewriteRule ^(.*)$ $1 [E=REWROTE:0]

# Flag that the new URL rewrote
RewriteRule ^new/(.*)/(.*)$ /old.php?action=$1&id=$2 [L,E=REWROTE:1]
# Only rewrote the old url if we didn't rewrite
RewriteCond %{ENV:REWROTE} !^1$
RewriteCond %{QUERY_STRING} ^action=([a-z]+)&id=([0-9]+)
RewriteRule ^old\.php$ /new/%1/%2? [R=301,L]

This set of rewrite rules accomplishes both our goals and the new URLs are all we'll ever see in the address bar. You can set up any number of rewritten URLs (there's no need to repeat the code turning on rewriting and REWROTE flag), editing them for your particular GET variables and layout.

Proudly displaying our shiny new URLs, we can send in surgical teams into the site's source code file-by-file, slowly and carefully replacing instances of the old URLs with the new ones. Once all the URLs are replaced, you can watch your server logs to see usage of the old URLs fall off. The way is now prepared for you to further beautify your site inside and out.