Everything is evolving. Everything is falling apart.
Web sites are constantly changing, with old files moving or being deleted and new files being created. This usually causes a lot of hyperlinks pointing to where a file used to be, causing the server to throw up the dreaded status code 404: file not found.
The reason the 404 status code is so dreaded is that it's ambiguous. The file wasn't found, but the server has no idea whether it's been temporarily moved, permanently moved or permanently deleted. Hence the server can't guide visitors (either people or robots) as to how or even if they should update the link that brought them to the out-of-date location.
This needn't be the case, however, as adding a few lines to an .htaccess file can point visitors in the right direction, either sending them to the file's new location or telling them that it has permanently gone. This guide will talk you through setting up these more specific server responses.
First, let's take an example site with some out-of-date files. It used to have a thriving blog archived at /archive, which had hundreds of files that have now been removed. It also had some static HTML files, /index.html, /blog.html and /about_me/index.html, which have all been replaced with dynamic PHP scripts.
- Check access.log for status code 404 The first step is to check your latest log file. If you search it for the number 404, you can find out exactly which locations visitors are trying to get to which are missing. Although this step seems obvious, you might be surprised by some ancient files you deleted years ago that are still being requested. Make a note of every missing file which visitors are still requesting. In this example, the log file confirms that visitors are still trying to access files in the /archive directory, and still trying to get the plain HTML versions of the other files.
- Add Redirect directives to /.htaccess Now you know which locations are duds, you need to tell Apache that the files aren't there, and also tell it what happened to them. This is done using the .htaccess file in the web site's root directory. There are four kinds of redirecting lines we can add to this file: permanent (status code 301), temporary (302), "see other" (303) and gone (410). Out of these, the gone type is the only one which doesn't end with a URL pointing to the file's new location. Let's look at the example again. The whole /archive directory has been permanently deleted, so let's tell Apache by adding the following line: Redirect gone /archive/ Now any visitors trying to access that directory will be told that the resource has been permanently removed, rather than being told that it is simply missing. Next, we can add the lines to redirect the plain HTML files: Redirect permanent /index.html http://www.example.site/index.php Note that the last part of the line is a full URL, not simply the filename. This makes it easy to tell visitors that you've changed domains, for example, by redirecting them on a file-by-file basis to the new domain name. Similarly, the next line is Redirect permanent /blog.html http://www.example.site/blog.php and the last line is Redirect permanent /about_me/index.html http://www.example.site/about_me/index.php Even though this file is in a subdirectory, it is still handled in the same .htaccess file as all the others. That's the bulk of the work done - people will be redirected to a different page (probably without even noticing), and robots should automatically update their databases to point to the file's new location, or, in the case of permanently removed files, take them out of their databases altogether.
- Add an ErrorDocument directive for status code 410 As the error document for 410 isn't all too common, it's probably a good idea to write a custom error document that explains what's going on. Just write a regular HTML file explaining that the resource the visitor just tried to access has been permanently deleted, and that they should update their bookmarks. Let's say that you've called this error document gone.html and put it in the root directory of the site. Simply add this last line to .htaccess: ErrorDocument 410 /gone.html Now whenever someone tries to access an old file that's been deleted, they'll know that it's been permanently removed and that they're supposed to see the error document, rather than thinking the site's full of broken links that the administrator doesn't know about.
- Test it Try accessing a few of the old files, and make sure that you're redirected to the new files properly, or that you see the right error document. Once everything's working perfectly, it's ready to upload to the live server.
- Repeat the process a few days later After a few days, check the access.log file again. Are there any new 404s? If so, add those resources to your .htaccess file with the others.
After all that, your visitors should never have to see a 404 error document on your site again. The people visiting your web site will find what they're looking for easier, and the robots will be able to keep their databases (such as search engines) up to date, causing less people to request the old resources in the first place. That has to be a good thing.