Site sections

Validating Australian Museum Online

Russ Weakley
15-Feb-03

How do you go about validating approximately 10,000 pages? The Australian Museum web team decided it was possible, and worthwhile. This article explains the steps we took. but first, a little about validation:

What is valid code?

Validation is a process of checking your documents against a formal standard, like those published by the W3C. A document that has been checked and passed is considered valid.

Why use valid code?

Global changes

The first step we undertook was a wide range of global changes across the entire site. This meant that we sometimes changing over 10,000 pages at a time. Changes included:

Global change 1: Adding a Doctype

Many of our HTML pages had the old Doctypes HTML 3.2 and HTML 4.0, the latter throwing errors with the use of the "name" attribute in image tags which is required for JavaScript image rollovers, so we replaced this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

With this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

Global change 2: Adding Character encoding

It is very important that the character encoding of any XML or (X)HTML document is clearly labelled. If a user agent (eg. a browser) is unable to detect the character encoding used in a Web document, the user may be presented with unreadable text. So, we added the character set below to all files:

<meta http-equiv="content-type" content="text/html; charset=utf-8">

Global change 3: JavaScript Tag

Our <script> elements had no type attribute - and used the depricated "language" atribute instead. The type attribute specifies the scripting language of the element's contents. So, we changed this:

<script language="JavaScript">

To this:

<script type="text/javascript">

Global change 4: Invalid <body> attributes

We removed the invalid body attributes that are used to force page content into the top left corner of the page. These attributes:

leftmargin="0" topmargin="0" marginwidth="0" marginheight="0"

were removed and replaced with an additional rule in the CSS file:

body
{
padding: 0;
margin: 0;
}

Global change 5: Bold and Italic tags

As bold and Italic tags are deprecated, we did global changes across all of our sites and replaced this:

<b></b>

with this:

<strong></strong>

and this:

<i></i>

with this:

<em></em>

Section-by-section changes

There were many that could not be done globally. These were generally done by downloading a section of the site at a time and doing mini global changes. Site-section changes included:

Section change 1: Image-based submit buttons

Many of our image-based submit buttons had width, height and border attributes. So, we removed all these attributes within submit buttons.

Section change 2: Invalid characters

As we often pulled content from MSWord into HTML Editors we often found invalid characters that needed to be replaced or removed (as well as hundreds of horrid local styles). Some of the more common invalid characters include:

Section change 3: CSS changes

Finally, there were many CSS files that needed to be edited by hand including:

Fina results

There were other minor adjustments we made throughout our site during this process. However, it should be mentioned that our files were reasonably close to valid when we began. We always used quote marks around attributes and all images had "alt" tags, so our task was not as large as it could have been.

The bottom line is that making a large site 100% valid can be done. Apart from a few sections that we are still working on, our site is currently 100% valid.

How do you check if your code is valid?

The tools we used included: