A Tasty Pixel » Web

Smart 404 for WordPress

Save visitors to your WordPress site from unhelpful 404 errors!

When a page cannot be found, Smart 404 will use the current URL to attempt to find a matching page, and redirect to it automatically. Smart 404 also supplies template tags which provide a list of suggestions, for use on a 404.php template page if a matching post can’t be immediately discovered.

Instead of quickly giving up when a visitor reaches a page that doesn’t exist, make an effort to guess what they were after in the first place. This plugin will perform a search of your posts, tags and categories, using keywords from the requested URL. If there’s a match, redirect to that page instead of showing the error. If there’s more than one match, the 404 template can use some template tags to provide a list of suggestions to the visitor.

This plugin is also useful if you have recently changed your permalink structure: With minimal or no adjustment, old permalinks will still work.

Taking the 404 further

Update: I have now written a WordPress plugin that does all of this for you. Please use this plugin instead.

I’ve just changed my permalink structure for my blog to something a bit prettier. In the process, I realised that some previously-working permalinks weren’t operating any more, despite having a plugin set up to maintain old permalinks.

WordPress is fairly good at figuring out what viewers are requesting when a post can’t be found immediately – for example, if you’re using a permalink structure with an ID number in it, and the requested ID is incorrect, WordPress seems to be able to redirect to the correct address.

However, it’s not 100%, as I recently realised.

Consequently, a few pages were heading to the 404 page, which isn’t ideal. I changed my template’s 404 page to do a search for what the viewer was really after, and redirect them there. If it can’t find an exact match, it’ll perform a search with keywords extracted from the URL. If it finds a single result, it’ll redirect, otherwise it’ll put up a few results as suggestions on the 404 page.

It also works as a nice search shortcut. Try it: http://atastypixel.com/wordpress 404 redirect

Adding Twinkle to Twitter posting lists

Twinkle from Tapulous is an iPhone Twitter application with a twist – it uses location information, to introduce local social networking, and supports photos, a great extension to the standard Twitter recipe.

Twinkle handles photographs by uploading them to Tapulous’ server, then appending a ‘snipurl’ to the Twitter message which points to a page displaying the image. That works fairly well, but wouldn’t it be nice to actually see images inline, when viewing Twitter posts?

Well, now you can, if you have a self-hosted Twitter post list. See my Twitter feed on the right for a demo (although there may not be a post there with an image).

Smart redirects

After migrating my blog over to WordPress, files have moved around a bit. Sites that linked to files on my site (including Google Images, as a prominent example) now link into the void, which is a bit of a pain. As a partial solution, I’ve added some .htaccess rules to help route visitors to the right place:

Redirect 301 /feeds/index.rss2 http://atastypixel.com/feed Redirect 301 /feeds/atom.xml http://atastypixel.com/feed

However, that doesn’t solve all problems – after I reorganised my uploads, files are located at a very different path to their original home. So, I added a short script which does a search for requested files if they can’t be found, then redirects visitors to the right file.

Here is the script, and an accompanying .htaccess file needed to hook into it:

redirection.zip

To use it, put the script and the htaccess file into a folder on your site where you want it to take effect (for me, it went into ‘wp-content/uploads’). Then, modify the htaccess file – just set the RewriteBase field to the directory you’re in – and rename it to ‘.htaccess’.

Re-organising/assembling uploads for WordPress

Migrating from Serendipity, I’ve had a few hiccups in data migration. It doesn’t help that I’m extremely OCD when it comes to data, so everything’s gotta be perfect.

I had a problem with the way my uploads (the resources linked into the blog) were organised from the old Serendipity days – all in one big ol’ folder – and wanted to re-arrange them the way WordPress does it, which is to arrange them into folders by year and month.

This is also useful if you’re changing your domain, or splitting up one blog into two. The entries link to the old domain/blog, and thus the alternative is to do keep the old site around or muck around with clever redirects.

My way’s neater.

So, I put together a script which goes through all posts, pulls out links to uploaded files, and moves/copies them into a better arrangement, fixing the link in the process. If it can’t find the files, it’ll use a ‘find’ command to attempt to locate them.

In case it’s useful to others, the script is here: reorganise-wp-media.php.zip

It goes in the WordPress webroot; put it there, then edit it to apply your settings. Give it a ‘dry run’ test, make sure all is well, then go for it. Beware, although I used it and it worked okay, I offer no guarantee that it won’t mangle your data, delete random files or eat your dog.

Serendipity (S9Y) importer for WordPress

Update 2: Jon reports that he’s improved on Carsten’s 1.3 version, and has released Version 1.4!

Update: Carsten Dobschat has continued the fine tradition and improved the importer further, adding support for extended posts, and tags. Grab S9Y importer version 1.3 over at his site.

In the process of merging this site from Serendipity over to WordPress, I came across an importer which lets me migrate the data across. Unfortunately it was a bit buggy, not properly assigning categories and timing out when processing the post data.

I made some improvements, and here’s the new version, if anyone else finds it useful:

Version 1.2 of Serendity importer for WordPress (patch)

Are you leaking search queries?

Recently, AOL leaked 20 million search queries to the world (as covered in a NY Times article). It listed search queries alongside user numbers, which links search terms to individuals. Although no names are listed, it is often not difficult to determine (or at least narrow down) an individual’s identity, given their search history, as demonstrated in the above article.

The privacy ramifications here are extremely worrying. Web searching is something we all do, and for some of us, often reveals all kinds of intimate details about our lives. That this search history is recorded at all is, if I may say so, an abomination; but that this was carelessly leaked by AOL is very worrying. And yes, even Google record search queries (they’re just a little more careful with it).

The appeal to government and ‘law enforcement’ groups is obvious, which makes this even more dangerous. While Google resisted the Justice Department’s subpoena, the other search engine’s capitulated.

Particularly worrisome is the international nature of many of these search engines, with data centres located outside regions that are protected by various privacy laws. Only recently, Yahoo cooperated with the Chinese government in revealing the identity of a Chinese journalist who had distributed a warning from the Chinese government about the reporting on sensitive local issues; the journalist is now serving a 10 year prison sentence.

Yahoo was forced by the local laws to cooperate with the Chinese government. While one may assume that this only affects local users, it’s quite common for data to be mirrored at several sites, ensuring adequate redundancy should a site be compromised and its data lost. Thus, it’s quite possible that Australian and American search history is stored in regions where the Government has complete control over access to this data.

The Chinese incident is sinister enough, but this is probably just the beginning.

Consequently, the EFF (Electronic Frontier Foundation) have published an article with a few notes on how to maximise your privacy when using search engines.

Some points are:

Don’t put personally-identifying information in your searches
Don’t log in to a search engine account
Don’t accept cookies from your search engine

They are worth reading through; in particular, there’s some directions for Firefox users on how to configure an extension to increase your anonymity with Google searches.

Even if you don’t follow those suggestions, I recommend regularly clearing out your Google cookies. These cookies provide Google and other search engines with a link between your search queries – an identifier that ties them together. By clearing these cookies out regularly, you sever the link between past queries and future queries. In particular, do this before embarking on a particularly sensitive search.

To remove cookies in Firefox:

Bring up Firefox Preferences (on a Mac, click the ‘Firefox’ menu at the top left, then ‘Preferences’)
Click the ‘Privacy’ icon
Click the ‘Cookies’ tab
Click ‘View Cookies’, bottom left
Type ‘google’ in the search bar, and remove all related cookies by selecting them and clicking ‘Remove Cookies’

In Safari:

Bring up Safari Preferences (‘Safari’ menu, ‘Preferences’)
Click the ‘Security’ icon
Click ‘Show Cookies’
Scroll down to the Google cookies, select them, and press ‘Remove’

Instructions for Internet Explorer are here (If you are using Internet Explorer, it is recommended that you switch to Firefox, which offers increased security and stability. Honestly.).

Alternatively, just delete all cookies, which probably can’t hurt.

URLs, the last great ‘what the’

Happily surfing the Microsoft website today (don’t ask), I was amused to note the ridiculous URLs in use – Here are some examples:

http://www.microsoft.com/ downloads/details.aspx? FamilyId=4C254E3F-79D5-4012- 8793-D2D180A42DFA &displaylang=en
http://www.microsoft.com/ downloads/Browse.aspx? displaylang=en&productID= 4289AE77-4CBA-4A75- 86F3-9FF96F68E491
http://www.microsoft.com/ downloads/info.aspx?na=63&p=& SrcDisplayLang=en& SrcCategoryId=&SrcFamilyId= 9996B314-0364-4623-9EDE- 0B5FBB133652&u=%2f genuine%2fdownloads%2f WhyValidate.aspx%3ffamilyid %3d9996B314-0364-4623- 9EDE-0B5FBB133652 %26displaylang%3den

Whatever happened to friendly URLs? If I was going to point someone to an article or a download from Microsoft’s website, I’d need several weeks just to recite it! (I will admit that Apple is no better – take their link to the MacBook Pro on their store site: http://store.apple.com/133-622/WebObjects/australiastore.woa/ 80505/ wo/3s3D4l85iljb2mPXrPH2pCTDMcy /0.SLID? nclm=MacBookPro&mco=7C576790)

Crazy long URLs force site users to work through the navigation instead of being able to point each other to pages: Imagine reciting such a URL over the phone – it would never happen – "slash, 3, lowercase s, 3, capital D, 4, lowercase l, 8, 5…". Instead, one would tend to point to apple.com, and give directions from there. There is also no way anyone could work out what each URL points to by looking at it. It’s crazy!

In the case of the four URLs above, these really should be something like:

http://www.microsoft.com/downloads/ Worldwide_English/ActiveSync_4.1
http://www.microsoft.com/office
http://www.microsoft.com/Windows_Genuine_Advantage
http://store.apple.com/au/MacBookPro

These days, with ‘404 handlers’ and such things in common use (this site uses one!), it really is very easy to make user-friendly URLs. Having a decent URL for a site’s users means they’re more likely to be able to point each other to a site (Keep It Simple, Stupid), and thus more likely to bring more visitors to the site.

URL handlers are very easy to write – using Apache, one just needs a .htaccess file sitting in the webroot, which directs all URLs to a handler page:

RewriteEngine On RewriteBase / RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule (.*) /index.php

Then, have a page (in this case, index.php) which processes the URL and provides an appropriate page. SiteComponents has:

// Get URL request $s = substr($_SERVER["REQUEST_URI"], strrpos($_SERVER["SCRIPT_NAME"], "/")+1); // Strip off GET parameters and anchors if ( strpos($s, "?") !== false ) $s = substr($s, 0, strpos($s, "?")); if ( strpos($s, "#") !== false ) $s = substr($s, 0, strpos($s, "#")); // Run site $site->Run(urldecode($s));

The ‘Run’ method within $site will then handle the URL, and return an appropriate page (or suggest one if no exact match is found).