The HTML5 validator error

The charset problem

Until recently it was impossible to automatically validate the HTML of the pages on my website. The reason for this was that I had moved from XHTML 1.0 strict to HTML5 (to make working with the canvas easier and to be at the bleeding edge of available technologies.) I had made the recommended changes, so that the first few lines of the source looked like:

<!DOCTYPE html>
<html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en' lang='en' charset="utf-8">
<head>
<meta charset="utf-8" />
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />

This should have fixed things, with a streamlined DOCTYPE to make things even easier, but instead I faced the following error message:

The HTML5 validator error
The HTML5 validator error

The reason was that the parser has to know the charset of the document, but the charset of the document was specified in the document itself (as was common practice for earlier versions of HTML.) This obviously doesn’t make sense, although there are some workarounds. If the charset appears in the first 512 characters the parser should be able to find the information, which is useful for when documents are saved out of context, but still not an ideal solution. After browsing online I found no suitable answer using only HTML and it turns out that the best practice is send some headers that specify the charset:

<?php header('Content-Type: text/html;charset=UTF-8') ; ?>

Of course this header has to be sent before the rest of the document to give the parser a chance to read the document properly in the first place. Once that was added the validator returned no errors:

Success!
Success!

(The warning refers to an experimental feature of the validator known as HTML5 Conformance Checker, and does not reflect an warnings associated with the page itself.)

This is a small fix to a simple problem, very much like the one I described in a previous post about changing standards and HTTP requests. These changes are a little frustrating, but they serve to remind us that we are working in a living medium that is constantly updated and that people take standards and security seriously. It also brings a little joy into my day to make my whole website a little better than it was before. One day I’ll be satisfied with my website, but it’s been development for about a decade now, and I don’t see that process coming to end any time soon.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.