Validating Australian Museum Online
Russ Weakley
15-Feb-03
How do you go about validating approximately 10,000 pages? The Australian Museum web team decided it was possible, and worthwhile. This article explains the steps we took. but first, a little about validation:
What is valid code?
Validation is a process of checking your documents against a formal standard, like those published by the W3C. A document that has been checked and passed is considered valid.
Why use valid code?
- Valid code will render faster than code with errors
- Valid code will render better than invalid code
- Browsers are becoming more standards compliant, and it is becoming increasingly necessary to write valid and standards compliant HTML
Global changes
The first step we undertook was a wide range of global changes across the entire site. This meant that we sometimes changing over 10,000 pages at a time. Changes included:
Global change 1: Adding a Doctype
Many of our HTML pages had the old Doctypes HTML 3.2 and HTML 4.0, the latter throwing errors with the use of the "name" attribute in image tags which is required for JavaScript image rollovers, so we replaced this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
With this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
Global change 2: Adding Character encoding
It is very important that the character encoding of any XML or (X)HTML document is clearly labelled. If a user agent (eg. a browser) is unable to detect the character encoding used in a Web document, the user may be presented with unreadable text. So, we added the character set below to all files:
<meta http-equiv="content-type" content="text/html; charset=utf-8">
Global change 3: JavaScript Tag
Our <script> elements had no type attribute - and used the depricated "language" atribute instead. The type attribute specifies the scripting language of the element's contents. So, we changed this:
<script language="JavaScript">
To this:
<script type="text/javascript">
Global change 4: Invalid <body> attributes
We removed the invalid body attributes that are used to force page content into the top left corner of the page. These attributes:
leftmargin="0" topmargin="0" marginwidth="0" marginheight="0"
were removed and replaced with an additional rule in the CSS file:
body
{
padding: 0;
margin: 0;
}
Global change 5: Bold and Italic tags
As bold and Italic tags are deprecated, we did global changes across all of our sites and replaced this:
<b></b>
with this:
<strong></strong>
and this:
<i></i>
with this:
<em></em>
Section-by-section changes
There were many that could not be done globally. These were generally done by downloading a section of the site at a time and doing mini global changes. Site-section changes included:
Section change 1: Image-based submit buttons
Many of our image-based submit buttons had width, height and border attributes. So, we removed all these attributes within submit buttons.
Section change 2: Invalid characters
As we often pulled content from MSWord into HTML Editors we often found invalid characters that needed to be replaced or removed (as well as hundreds of horrid local styles). Some of the more common invalid characters include:
†- replace with spaceí- replace with'(single quote mark)&- replace with&(only in non html text)ñ- replace with-(dash)ë- replace with'(single quote mark)…- replace (three dots within single character) with...(three dots)‘replace with'(single quote mark)’replace with'(single quote mark)–replace with-(dash)
Section change 3: CSS changes
Finally, there were many CSS files that needed to be edited by hand including:
- We added "background-color" whenever "color" was specified and visa versa. Often this meant setting the background colour to "transparent". This is recommended by WC3.
- We added quotes around font names with white space. Without quote marks, white space in font names will be ignored.
Fina results
There were other minor adjustments we made throughout our site during this process. However, it should be mentioned that our files were reasonably close to valid when we began. We always used quote marks around attributes and all images had "alt" tags, so our task was not as large as it could have been.
The bottom line is that making a large site 100% valid can be done. Apart from a few sections that we are still working on, our site is currently 100% valid.
How do you check if your code is valid?
The tools we used included:

