articles
The state of the Art in Australian web development
Abstract
Author: John Allsopp
History: Presented by John at WE05, 30th September, 2005.
Westciv's John Allsopp takes a good hard look at just exactly how major Australian sites are developed, and how well (or otherwise) they adhere to best practices.
Current practices in Web Development in Major Australian Sites - a survey
How are major companies and government departments in Australia developing their sites today? Are they adhering to best practices in development and accessibility? This presentation looks at major Australian sites, to determine whether they are using best practices, and where they are falling down. We'll see what patterns emerge, where things are going well, or otherwise. And we'll conclude with some recommendations based on this cold hard evidence.
Methodology
My aim with this survey was first to develop an objective, if a little unsophisticated, measure of a site's adherence to current best practices in web development and accessibility. This could then be used to gauge sites against one another, and over time.
Now, if you are going to investigate best practices, of course we'll first need to define what these are. The main areas which standards based developers are concerned with, and which perhaps differentiates the standards based approach to development from more "traditional" approaches are
- Validity of XHTML
- Use of and validity of CSS
- Use of semantic, structural HTML, and separation of content structure from presentation
- Accessibility
Each of these areas is, to varying degrees, reasonable non contentious. Each of them, to a greater or lesser extent are amenable to machine checking, hopefully making any survey like this at least reasonably objective. We'll look at the criteria and method of assessment in a moment, but one reasonably glaring omission from the list above is usability. It was excluded for a couple of main reasons. Firstly, unlike the others, its not easily amenable to machine checking, so any objective testing would be prohibitively expensive and time consuming, if possible at all. It's also an area where there is perhaps less consensus as to just what best practices are, and certainly how you might objectively measure them.
I only assessed the front page of the sites - front pages arguably represent the "best effort" of a developer. You might also argue it's where every man and his dog (marketing, senior management, and so on) gets to put their oar in, and in some ways, is less likely to reflect a developers best effort. We need to start somewhere however, so this is as good a place as any.
So the areas I used to assess best practices were
- HTML/XHTML
- CSS
- Semantic and Structural HTML
- Accessibility
All these areas are scored out of 5, giving a total of 20 points. Here is how they are allocated
HTML/XHTML
This assesses the extent to which a site uses valid XHTML/HTML.
The criteria
Doctype
Valid pages require a document type. The W3C recommended doctypes are HTML 4.01, XHTML 1 and XHTML 1.1 If no doctype is specified (or a doctype prior to 4.01) subtract 2 points
Validation Errors
For each type of validation error, subtract 1 point - to a minimum of 0 points.
For example, not quoting attribute values where required, even if done 50 times, is a deduction of one point.
CSS
The criteria
No use of CSS - 0 points
Validation Errors
For each type of CSS error, deduct 1 point (this is rather generous) We also deduct a point for avoiding the recommended practice of using a generic font family when specifying fonts.
Semantic and Structural HTML
The criteria
- Use of Tables for layout - deduct 2 points
- Use of font elements - deduct 1 point
- Use of presentational attributes - deduct 1 point
- Each use of a type of element in a semantically inappropriate way - 1 point
- Use of inline style - subtract 1 point
The first point might be a little contentious to some. Tough. While it is a reasonably hard line to take, it is clear from several angles that using tables for layout has for some time been far from a best practice. Similarly, penalizing the use of inline CSS appears harsh, given that it is a perfectly acceptable practice according to the W3C recommendations. This section looks at the issue of separating content and appearance, and inline CSS certainly is far from perfect in that regard.
Accessibility
Accessibility is notoriously difficult to assess, particularly mechanically. The aim of this survey is not to comprehensively determine the accessibility of a site, but rather to gauge its basic adherence to core accessibility recommendations. The reasoning is that failure to adhere to even simple accessibility guidelines indicates a deeper failure in this area.
The criteria
Deduct 1 point for each non trivial, non controversial, type of accessibility error reported by Cynthia Says, such as lack of alt attributes, and so on.
The Sites
The aim of the survey was to look at how major Australian sites are adhering to best practices. We know that there will be many sites, developed by developers who adhere to standards and best practices, but how far do these best practices extend to the most significant sites on the web in terms of their user base.
To this end, the following sites were chosen. They represent the biggest companies in Australia, with market capitalzations of between 2 and 70 billion dollars. They represent major government sites that millions of people use every day. They represent the most visited Australian sites by Australians. In short, they represent the mainstream of web development.
ASX top 50
Major banking sites
- NAB
- Commonwealth
- St George
- Westpac
- ING Direct
- ANZ
Popular travel sites
- Flight Center
- Virgin
- Qantas
- Jetstar
All TV channels
- SBS
- ABC
- 7
- 9
- 20
- Fox
Main government sites for several states and federally
- australia.gov.au
- Centrelink
- nsw.gov.au
- vic.gov.au
- qld.gov.ay
- ATO
Other Australian sites among the Top 100 sites visited by Australians, such as Sensis and Yellow Pages.
The results
If you are a developer who pays more than a passing interest in the issues we have discussed, ask yourself how well or badly you think these major sites are doing.
To be honest, I thought the result of this survey would be "nothing to see here people, move along". That after a few sites I'd be left with the realization that it was all a bit pointless, that everyone out there had digested the lessons, used the validators, cared about (or at least had some basic understanding of) accessibility issues, and I could actually go ahead and foprget the whole issue.
And it all started so well.
I began with the ASX top 50. I felt that if these sites did well, well surely other more popular sites would be doing it right as well.
The biggest company in Australia, "The Big Australian" is BHP Biliton. And lo and behold their website is, in the terms of the criteria, a good effort.
- Valid XHTML 1.0 Transitional!
- Valid CSS!
- Nice structural and semantic XHTML
- A couple of minor accessibility issues
18/20
Things were looking pretty good. From then on it was almost literally downhill. Only one other site scored higher, and one other as high. The next two weeks became a nightmare where every spare moment was spent validating mangled and broken HTML, CSS, wading through at times dire source code, and despairing at the thought of all the lawyers who were going to make a fortune out of the disability discrimination act of 1992.
A total of 6669 HTML validation errors on just 83 pages. Only 9 of the 83 sites validated to any doctype.
I had wanted to do 100 sites but simply ran out of time and enthusiasm, indeed stamina at 83. I had seen and had enough.
Frankly, I never want to see the Firefox toolbar again. I never want to see the results of the W3Cs validators, or Cynthia Says. I don't want to look at the source code of one more table based site. It's a shame that most developers don't seem to want to either.
So what can we learn from this carnage?
Well, that we still have some way to go until even the most basic aspects of the best practices that the W3C and others having been developing for a decade are even moderately adopted.
But in a more practical sense, a number of very strong patterns emerged from all this, which I think we can draw some good lessons from.
And in part, its a bit like driving past a car wreck. Few manage to not take look.
HTML
doctypes
Let's start at the top. Doctypes. What doctypes (if any) are major sites using?
Well, the surprising thing is that they are using any at all. Given only 9 of the surveyed sites actually validate, and the average number of errors on each page is over 80, I find it a little surprising that 52 of the 83 sites actually have specified doctypes (and a couple of others try to).
You'll see from this graph that its loose all the way. 1 each for XHTML Strict, XHTML 1.1 and HTML 4.01 Strict (only the XHTML 1.0 Strict one actually validates for what its worth).
Hats off too to the HTML 3.2 hold out. Now that is old school. If you are unfortuna te enough to have to buy petrol, you will most likely will have done so from them of late. Probably leaded.
So what kind of errors are people making?
It's interesting to look at the frequency of the errors. In fact, while there are several thousand errors among the sites surveyed, and an average of 80 per page, fewer than 25 turn up with any significant frequency (3 or more times across the sites). Only 15 turn up on 10 or more sites. People are making the same 15 to 20 mistakes over and over again. Let's take a look in some more detail at these major, oft repeated errors. On the whole they are simple to fix, but some of them are rooted in common misunderstandings that need to be cleared up.
Does it really matter?
Before we get into practicalities, does it all really matter? Afterall, these pages work in browsers (that is IE) right?
Here are some of the companies and organizations surveyed.
- Commonwealth Serum Laboratories
- Qantas
- ATO
- Commonwealth Bank
Technology is essential, fundamental to what they do. Lives and livelihoods depend on their technology. Is it not reasonable to conclude that if they fail to adhere to even basic best practices when it comes to the most public of their technologies, we have the right to be skeptical of how well they adhere to best practices in other technological areas?
That alone is a significant reason why it matters.
I imagine too that many if not most of the organizations whose sites I surveyed are ISO 9001 compliant. So they clearly feel best practices are important. Why does this not carry over to the web?
The errors
Here, in order of their frequency, are the most common of the HTML errors, with some simple ways in which most of them can be cleared up.
Missing alt attributes
Not only are alt attributes a WAI guideline (1.1) for many elements, they are also required for these elements by document types.
More than 50% of sites fail this validation test.
Fixing this problem is essentially trivial, and suggests both a lack of knowledge, coupled with the failure to even attempt validating a page. This conclusion will be drawn, sadly, again and again.
Script (and style) elements with no type
Well over half of all sites surveyed had this problem with script elements. The type attribute is required. Often script elements included the language attribute, often included neither attribute.
At first glance, this extraordinary rash of scripting (if I were to redo this survey I would look at how many of these sites used scripting) appeared to achieve little that could not be achieved without scripting. Given the significant problems scripting caused validation (and javascript problems don't stop here) I'd suggest scripting were used a lot less frequently than it is.
I'm also sorry I didn't include a count of how many sites used browser sniffing via javascript. If I had a lot of graduate students working for me I'd do that.
Style elements without type were much less frequent, but worth noting. We'll see how script and style elements can get us in trouble in other ways a little later.
TOPMARGIN
If anything indicates the belts and braces, kitchen sink, old ways are the best ways, she'll be right attitude of many many developers it is the incredible frequency of the pseudo HTML attributes TOPMARGIN LEFTMARGIN MARGINHEIGHT and MARGINWIDTH.
I never want to see these again as long as I live. OK!
As far as I am aware, these were never part of any published specification, and indicate a developer who really needs some bushing up on their modern web development practices. Like from the beginning.
38 out of the 83 sites used these properties. That's 39 too many.
Unescaped ampersands
To some extent, many of the problems we see recurring, I suspect, are as much a function of CMS and tools as they are of hand coders. I suspect a lot of old school CMS and Tools are adding TOPMARGIN and other properties like them. That does not exonerate the developers, more than it explains the prevalence of such poor practices.
35 of the 83 sites have unescaped ampersands. Very often these occur in urls (much less commonly in text, which suggests that developers on the whole know about the problem, but ignore it in URLs). I suspect it is often poorly developed apps that are producing these URLs.
In HTML, even in URLs & must be written as & for a page to validate.
HTML Help has a good article on common validation errors which covers this issue, and a number of the others we encounter here. There is a link to this article in the resources section.
Malformed documents
Above all else, HTML, and particularly XHTML documents must be well formed. To be well formed, elements must be properly nested, and properly closed. These are the absolute foundation of good HTML.
Far to many sites fall down in this area. Sites displayed missing start or end tags. Self closing elements like meta elements that have close tags. Overlapping elements. In short, this basic, fundamental aspect of HTML is a shemozzle.
In addition to this many sites fall foul of the containment rules of HTML/XHTML.
Inline elements may only contain other inline elements Paragraphs are an exception to the basic rule that block elements may contain any other kind of element List items must be contained within a list Inline elements must not appear directly inside the body but must be contained within a block element Forms may not be contained directly within a table, but within TD elements
To illustrate the problem, here are some of the containment errors I found
- <p> elements inside <p> elements
- <div> elements inside <p> elements
- <div> elements inside <a> elements
- <body> elements inside <table> elements
- <table> elements inside <a> elements
- <h4> elements inside <a> elements
- <form> elements inside <p> elements
and many more.
Together, fundamental problems like these appear in the majority of documents.
30 of the 83 sites are generally malformed documents, or break basic containment rules. In addition, 14 of the 83 have containment problems associated with tables and forms. As well, 20 documents are malformed specifically in their use of tables - these malformations in addition to any general problems mentioned above. When it comes to a valid page, nothing impacts on the likelihood of a problem than using tables. As we will see shortly, 71 of the 83 sites use tables for layout to some extent. 20 of these 71 have problems with malformed tables (and in addition 14 have problems with forms contained in tables.)
For heaven's sake, almost 10% of the sites (7 out of 83) even display malformed comments!
And there are other specific containment issues we'll get to shortly.
Style and link elements and containment
Style and link elements must be contained in the head of a document.
10 sites have style elements in the body of their document. Another 3 have link elements in their bodies.
Again, it may "work" but it is not valid HTML and is far from a best practice.
XHTML and HTML
XHTML causes a number of different, significant problems for developers.
Case
We'll start with XHTML and case. As XML, XHTML is case sensitive. Element and attribute names are all lower case. <HEAD> is not an XHTML element start tag. HREF is not an XHTML attribute. 19 documents use an XHTML document type of some description. 10 of these have problems with case.
XHTML also requires a slightly different syntax, particularly for self closing elements like meta and link. These must end with />
Effectively half, or 9 of the 19 XHTML documents make syntax errors along these lines. In effect, they use HTML syntax.
While it is poor practice to have invalid HTML documents - it is potentially disastrous to have invalid XHTML documents. A choice to use XHTML should be accompanied by an absolute commitment to valid documents. Why? XHTML is XML. A validating XML parser, when encountering an error must stop parsing and return an error. IE is not a validating XML parser, so will continue to handle sloppy XHTML as it has done HTML. Other, newer browsers won't be so lenient. Now you might rejoin that my pages work well in Safari and Mozilla browsers, so what's the problem? At present unless XHTML pages are served as application/xhtml+xml, they are in effect treated as HTML by these browsers. Should the pages ever be served as application/xhtml+xml (and that may be something outside your control) your pages will look something like this
The beige screen of death.
Be very careful with XHTML. Using it is in essence a commitment to a fully valid page. Make sure you honor that commitment.
HTML
It's usually said that XHTML is backwards compatible with HTML. So using XHTML syntax with an HTML doctype will give you a valid page. But this is not necessarily the case.
XHTML syntax for link and meta elements in the head of a document, namely, <link ... /> or <meta ... /> will cause an error. This is not a problem for self closing elements in the body of a page.
This is a very little known error, and really not alluded to in the general documentation, official or otherwise, associated with the backwards compatibility of XHTML.
If using XHTML syntax with HTML doctypes, don't use the XHTML syntax for self closing elements in the <head> of the document.
Unquoted attribute values
It is good practice, always valid (for HTML and XHTML), and so recommended for consistency to always quote attribute values. IN HTML, "quotes are optional if the attribute value consists solely of letters in the range A-Z and a-z, digits (0-9), hyphens ("-"), and periods (".")" - HTMLHelp
In XHTML, all attribute values must be quoted.
20 of the 83 sites had errors associated with unquoted attribute values.
Unescaped Javascript
In HTML end tags are recognized within SCRIPT elements, but other kinds of markup--such as start tags and comments--are not" - HTMLHelp
"Authors should therefore escape "</" within the content. Escape mechanisms are specific to each scripting or style sheet language" HTML 4.01 specification
http://www.w3.org/TR/html4/appendix/notes.html#notes-specifying-data
So
<SCRIPT type="text/javascript"> document.write ("<EM>This won't work</EM>") </SCRIPT>
is invalid, while
<SCRIPT type="text/javascript"> document.write ("<EM>This will work<\/EM>") </SCRIPT>
is valid
In XHTML, unlike HTML the contents of a script element are not CDATA. We can solve this problem in the following way
<script type="text/javascript"> <![CDATA[ // Javascript ]]> </script>
But this causes problems with some older browsers , which we can get around like this
<script type="text/javascript"> /* <![CDATA[ */ // content of your Javascript goes here /* ]]> */ </script>
http://javascript.about.com/library/blxhtml.htm
Easier and better still is to link to an external Javascript files as well as CSS files.
TR and TD elements with background and border properties
The background and border attributes are not part of any W3C specification for TD or TR elements, regardless of what certain browsers might think. Many of these errors are a function of the "it works" mentality of 1995 (2005-1995= a long time ago ok) which went out with flying cars and jet packs. Validators and not browsers tell us what is right.
20 of the sites demonstrate this problem.
If anything indicates the "she'll be right it works in [a certain browser]" attitude (and plenty does) it is this.
The dregs
The remainder of the reasonably common errors are a bit of a grab bag. We'll go through them quickly. All of them could be quickly and easily identified using the validator, and fixed with minimal fuss.
Forms with no action attribute
Forms require an action attribute. 10 of the 83 sites fail to have this required attribute for form elements.
Repeated ID attribute values
A given ID value must appear at most once in a given document. 14 of the 83 pages reuse an id value. I suspect that this is often associated with tool or CMS generated content.
Illegal characters
The most common of these is the use of illegal characters in HTML documents. 9 of the 83 documents had this problem. For documents using a small number of non roman characters, this is easily alleviated using character entity references. That is the case for all of the documents surveyed. Proper internationalzation for non roman character sets is a considerably more difficult issue and way beyond the scope of this survey and discussion.
Malformed color values
5% of documents features malformed hex color values in their HTML, with a missing #.
We'll return to this issue with CSS (where color values actually belong), and the issue of the use of presentational attributes in HTML.
Deprecated attributes, invalid attributes and values
Many sites feature deprecated attributes, or even invalid (invented) attributes, and values for attributes. I literally gave up counting the incidence of these, so common were they. Another strong example of the "but it works" approach.
The envelope please
So, how did Australian sites fare? One our scale of 0 to 5, we have the following results.
- 10 of the 83 sites validated. All in all, probably better than expected.
- 64 of the 83 sites got a score of 0. This mean they featured 5 or more validation errors of different types.
- A scattering of sites received 1 to 4 out of 5. This perhaps indicates a less than perfect way of measuring relative validity. But the results are useful none the less. The particular usefulness is in seeing strong patterns of specific errors occurring, and being able to recommend a number of simple approaches to fix these problems.
The full results are available as a table here
These sites validated
- BHP Billiton
- Origin Energy
- Yellow pages
- Brambles
- Bureau of Meterology
- australia.gov.au
- ABC
- Telecom Corporation of New Zealand
- Rio Tinto
Honorable mention
mycareer, with a single error (the use of a name attribute with an image) which I suspect some errant tool (software, not human) threw in.
CSS
Fortunately, on the whole, sites fared quite a bit better with CSS than HTML. In part this reflects perhaps less opportunity to make errors, and the relative simplicity (at least for basic styling and syntax) of CSS over HTML. On the whole, coupled with the, as we shall see, overwhelming use of tables for layout still prevalent in most of these sites, and the significant use of font and other presentational HTML elements and attributes, the conclusion is that CSS is being used in a reasonably simplistic way at present, for basic text styling, and sparingly for page layouts, and more sophisticated purposes.
Only one of the sites failed to use CSS at all, encouraging to see.
As we will see a little later, 50 of them use inline CSS, mostly extensively.
The errors
21 of the sites were without any CSS error, and on the whole, the median and average score for CSS was higher than for any other area. This does of course leave 62, or 75% of all sites surveyed with CSS errors, most of which could easily be identified using a CSS validator (or good tool) and fixed in a few moments. Often the most difficult thing with CSS is spotting the errors that creep in.
The majority of CSS errors appear infrequently, and so are difficult to categorize. Two broad categories however are
- invented properties
- invented or mistaken property values
Time and again, properties like font-color (for color) align (for vertical align), and other less common non CSS properties (often simply presentational attributes from HTML like alink, bgcolor and so on) appear.
Don't make up properties please.
Similarly, legitimate properties and values are mixed up (so we see text-decoration: bold )
Further, property names or values are mispelled (text-tranfrom, for instance)
All of these are simply spotted using a validator (and not made in the first place using a good css development tool.
Some broad categories of difficulty also emerge.
- 39 of the 83 sites fail to supply the recommended generic font family for the font-family property.
- 22 of the 83 sites use the cursor: hand property. This is not CSS. I am sure that it works in certain browsers though.
Several of the sites use scroll bar properties, specific to IE. Several also use IE sopecific expression and filter "properties". Only one site uses a mozilla specific properties, -moz-opacity, which follows the convention of using the -browser prefix for browser specific properties.
Color, which we saw causing difficulty in HTML (where, in best practice terms it does not belong) causes even more difficulty with CSS. seagreen, indianred, limegreen and lightgrey are not CSS color keyword values (though are doubtless supported by some browsers). I didn't have the heart to check. As with HTML , hexadecimal colors must start with the # character, and may contain only 3 or 6 characters
10 of the 83 sites demonstrated one or both of these problems. Easily spotted and rectified, or avoided with validators or good tools.
The dregs
!important has no space. Since user !important declarations in CSS 2 take precedence over author !important declarations when of the same specificity, its arguable that author style sheets really ought not use !important declarations, as these exist to override user style sheets, and frankly if a user has a declaration (with or without !important) there is probably a very good reason for that.
Selector groups don't have a comma after the last selector.
CSS in external style sheets must not be wrapped in a style element
CSS comments are different from HTML comments
All of these occurred at least a small number of times, and so are worth noting.
The envelope please
As mentioned, on the whole, sites scored much more highly for CSS than any other area. A number of sites were saved from very low scores simply because their CSS was reasonably valid.
However, as mentioned, the CSS used is on the whole reasonably unsophisticated, largely reserved for text styling. CSS for layout is still some way off being mainstream, as we will see in a moment.
Structural and Semantic HTML
I imagine the most contentious "head" of best practice which I have decided upon is this one. The other three fall squarely under W3C recommendations. This is more amorphous, bringing together recommended practices which to an extent cut across all three.
In essence, it has long been recognized (even before the WWW) that separating content from its presentation is a valuable practice. We don't have the time to go into the reasons for this here.
Similarly, it is recognized that there is great value in using semantic markup.
Both of these are recognized explicitly in the WAI WCAG 1.0 accessibility guidelines and have been guiding principles in the ongoing development of HTML and XHTML going back over several years at least.
But how to measure whether sites embody this approach.
I must repeat here that my approach was not to develop an exhaustive methodology. Rather a much more threshold one, and one which is relatively easily and quickly administered, with little ambiguity. This would allow its use for comparative surveys across sites and time by independent individuals and groups.
The methodology I propose here is a score out of 5, like for the other heads, with points deducted as follows.
- Use of Tables for layout - deduct 2 points
- Use of font elements - deduct 1 point
- Use of presentational attributes - deduct 1 point
- Use of elements in a semantically inappropriate way - 1 point
- Use of inline style - subtract 1 point
The first point might be a little contentious to some. Tough. While it is a reasonably hard line to take, it is clear from several angles that using tables for layout has for some time been far from a best practice. Similarly, penalizing the use of inline CSS appears harsh, given that it is a perfectly acceptable practice according to the W3C recommendations. This section looks in part at the issue of separating content and appearance, and inline CSS certainly is far from perfect in that regard.
Tables and Layout
Dispiritingly, 71 of the 83 sites used tales for layout to some extent. A small number of these used them sparingly, associated with small parts for a page. For the most part though, all layout is still be created using tables in mainstream big league development.
Font elements
29 of the 83 sites, over a third, used font elements.
Do I need to comment at all on this?
Presentational HTML
47 of the sites used presentational HTML attributes (and I ignored basic table attributes like height, border and so on). Properties like background, border, bgcolor and so on.
33 sites used presentational elements, most notably <b> but also <i> and <u>. While <b> and <i> are not a deprecated, and indeed are part of the strict doctypes, these were invariably used in place of emphasis or headings. So semantically inappropriately. If something is a heading, mark it up as such.
inline CSS
50 of the 83 sites, 60%, used inline CSS. As mentioned, you might argue that since it is perfectly valid, then this is nitpicking at best. However, given that this quite strongly violates a well accepted principle of separating the presentation of content from the content itself, I'd argue that it is a reasonable criticism.
The envelope please
With a median of 1, and an average of just .66 out of 5, this was a weak area in the survey.
Of all the areas too, it would require the most effort to improve, potentially requiring, unlike the other three areas, a complete overhaul of the underlying code to remove layout tables and presentational HTML and transfer these to CSS.
Accessibility
The criteria I used for this section were simple, mechanical, and essentially uncontroversial. Priority 1 and 2 checkpoints from the WAI WCAG, as determined by Cynthia Says. Again, I'll admit that the results are not necessarily perfect, but they do, I argue, offer a reasonably objective, repeatable benchmark for how well a site adheres to the very basics of accessibility best practices.
As with other areas, a pattern of shortcomings emerged. As with HTML and CSS, these shortcoming, when understood and when tested are on the whole quite readily fixed.
the problems
In order of frequency, the following major issues emerged.
WAI WCAG checkpoint 11.2 Avoid deprecated features of W3C technologies
76 of the 83 sites had this problem. It's easily fixed. Don't use deprecated HTML elements or attributes.
WAI WCAG checkpoint 1.1 Provide a text equivalent for every non-text element
68 of the sites fell down in some respect here. Missing alt attributes were extremely common. Validators, or Cynthia Says will let you know when a required alt attribute is missing. As a note, although alt attributes are required on image elements, they need not have any value. In fact, where an image is entirely decorative, alt="" is the recommended attribute, as this means screen readers won't read aloud the contents of the alt attribute. Also, not only img elements require alt attributes.
WAI WCAG checkpoint 3.4 Use relative rather than absolute units in markup language attribute values and style sheet property values
45 of the 83 sites specified font sizes in CSS using pixels or points.
Use ems or percentages for specifying font sizes (but be mindful of IE5. 5.5 and 6 in quirks mode, where font sizes specified in ems of less than 1em appear as very very small). At present % would be recommend for font sizes under 100% or 1em.
WAI WCAG checkpoint 12.4 Associate labels explicitly with their controls
With 51 of the 83 sites reporting this error, clearly it is a significant problem (as are forms more generally, with their common lack of an action attribute, and their common containment problems within tables, both as noted above).
The WAI guidelines suggest the following approach to both implicitly and explicitly associating a form element with its label.
LABEL for="firstname">First name: <INPUT type="text" id="firstname" tabindex="1"> </LABEL>
http://www.w3.org/TR/WAI-WEBCONTENT-TECHS/#tech-associate-labels
http://www.w3.org/TR/WCAG10-HTML-TECHS/#forms-labels
WAI WCAG checkpoint 3.2 Create documents that validate to published formal grammars
Well. The less said about this the better.
WAI WCAG checkpoint 7.4 Until user agents provide the ability to stop the refresh, do not create periodically auto-refreshing pages
6 of the sites fell down in this department.
WAI WCAG checkpoint 6.3 Ensure that pages are usable when scripts, applets, or other programmatic objects are turned off or not supported
5 of the sites were reported as not supporting this by Cynthia Says. I suspect that the actually number might be somewhat higher, given the number of document.write pieces of script I saw while looking at code validation issues.
The envelope please
Accessibility is there area in which sites did the worst. A median score of 0, average of just .45 our of 5, and well over half the sites scoring zero for accessibility is not a great result.
1 site received 5, australia.gov.au, none 4, and only 10 even 3.
Certainly building accessible sites is not easy, but nor is it so difficult as the results, and folklore, suggest. I would argue they reflect as much a lack of will and consequent effort as anything else. This really has to change. Ironically, the Disability Discrimination Act essentially mandates the adherence to accessibility best practices, all the other areas of best practice are voluntary. It is disappointing that we are doing so badly.
Overall
So how do things stack up overall? We've seen the individual categories. What about the scores out of 20?
Well, it should come as no great surprise that things don't look too good.
An average of just over 5, a median of 5, and only 15 of 83 sites scoring 10 or more.
A detailed breakdown and count of the errors can be found here
The top 10 sites were
Site | Score/20 |
---|---|
australia.gov.au | 19 |
BHP Billiton | 18 |
ABC | 16 |
Bureau of Meteorology | 15 |
Yellow Pages | 15 |
Rio Tinto | 15 |
NAB | 13 |
qld.gov.au | 12 |
Telecom New Zealand | 12 |
Brambles | 12 |
But overall, given the tests were not overly exacting, nor designed to do anything other than determine the adherence to core, on the whole well understood and non controversial practices in web development, most of which can be objectively machine tested using free, readily available tools we really ought to be doing much much better.
What's going wrong?
I suspect at the bottom what is going wrong is that the ancient entrenched attitude of "but it works in my browser" is still central to many web developers attitudes. There is little evidence of using the HTML, CSS and Accessibility validators at all, borne out by even the most basic errors being repeated, in HTML, CSS and accessibility, and typographical errors riddling many pages and style sheets. I wonder whether the content writers used spell checkers (and suspect they did). So why aren't we doing the equivalent as developers? Arrogance? Ignorance? Apathy?
In the area of structural and semantic HTML, the widespread use of font elements, and tables for layout also underscore a problem of philosophy. Developers and designers simply aren't taking modern web practices to heart when designing and developing their sites. As long as "it works" then "she'll be right" seems to be the order of the day.
Conclusions
In a way, I found the results somewhat depressing. I had expected quite a bit better, to be frank.
In terms of validation, structural and semantic HTML and accessibility, there is little evidence that the significant majority of sites are doing things any differently than half a decade ago.
But on reflection, if we had done this survey or five years ago, we would have found little if any CSS, few if any doctype declarations, even fewer alt attributes, even more use of images for text.
At least we are moving in the right direction.
Let's hope when we do this survey in the future, we'll find more to be upbeat about.
References and resources
- A basic introduction to the hows and whys of validating CSS and HTML
- A good overview of major validation problems and what to do about them
- The W3C Web Accessibility Initiative home
- The W3C Accessibility Guidelines and Techniques
- Specifying non HTML (Javascript and CSS) data in HTML documents
- Javascript, HTML and XHTML from About.com
- Form element and label guidelines and techniques from the W3C WAI
Results
- HTML Errors
- CSS Errors
- Semantic/Structural Problems
- Accessibility Errors
- Site Scores
- Breakdown by sector
- Sector Results
HTML Errors
Error | Count |
---|---|
missing alt attribute | 46 |
script element no type | 45 |
TOPMARGIN | 38 |
unescaped &s | 35 |
generally malformed documents | 30 |
unescaped script content | 29 |
unquoted attributes values | 20 |
malformed tables | 20 |
background/border on TR/TD | 20 |
form/table containment | 14 |
ID value reused | 14 |
XHTML in HTML head | 13 |
form with no action | 10 |
style in body | 10 |
xhtml case problems | 10 |
xhtml using html syntax | 9 |
illegal characters | 9 |
malformed comments | 7 |
nobr element | 5 |
color missing # | 4 |
link in body | 3 |
style no type | 3 |
image with name | 3 |
class class | 3 |
CSS Errors
Error | count |
---|---|
generic font family | 39 |
cursor hand | 22 |
syntax problems | 7 |
malformed color | 6 |
poorly formed comments | 4 |
font-color | 4 |
expression/filters | 4 |
scrollbar properties | 3 |
! important | 2 |
style element | 2 |
malformed groups | 2 |
problems with property and value names | myriad |
Structural/Semantic HTML
problem | count |
---|---|
tables for layout | 71 |
inline CSS | 50 |
presentational attributes (not tables) | 47 |
presentational elements | 33 |
font element | 29 |
Accessibility
error | count |
---|---|
11.2 Avoid deprecated features of W3C technologies | 76 |
1.1 Provide a text equivalent for every non-text element | 68 |
12.4 Associate labels explicitly with their controls | 51 |
13.1 Clearly identify the target of each link | 42 |
3.2 Create documents that validate to published formal grammars | 27 |
7.4 Until user agents provide the ability to stop the refresh, do not create periodically auto-refreshing pages | 6 |
6.3 Ensure that pages are usable when scripts, applets, or other programmatic objects are turned off or not supported | 5 |
Site Scores
Organisation | HTML | CSS | Struct | Access | Tot | HTML Errors | Doctype |
---|---|---|---|---|---|---|---|
australia.gov.au | 5 | 4 | 5 | 5 | 19 | 0 | HTML 4.01 Tr. |
BHP Billiton | 5 | 5 | 5 | 3 | 18 | 0 | XHTML 1.0 Tr. |
ABC | 5 | 5 | 4 | 2 | 16 | 0 | HTML 4.0 Tr. |
Rio Tinto | 5 | 5 | 2 | 3 | 15 | 0 | XHTML 1.0 Tr. |
Bureau of Meterology | 5 | 5 | 2 | 3 | 15 | 0 | HTML 4.01 Tr. |
Yellow Pages | 5 | 4 | 5 | 1 | 15 | 0 | XHTML 1.0 Tr. |
NAB | 0 | 5 | 5 | 3 | 13 | 39 | HTML 4.01 Tr. |
qld.gov.au | 0 | 5 | 4 | 3 | 12 | 29 | HTML 4.01 Tr. |
Telecom NZ | 0 | 5 | 5 | 2 | 12 | 193 | HTML 4.0 Tr. |
Brambles | 0 | 5 | 5 | 2 | 12 | 16 | XHTML 1.0 Tr. |
nsw.gov.au | 3 | 4 | 1 | 3 | 11 | 1 | HTML 4.0 Tr. |
mycareer.com.au | 4 | 5 | 2 | 0 | 11 | 1 | XHTML 1.0 Tr. |
whereis.com.au | 3 | 3 | 2 | 3 | 11 | 4 | HTML 4.0 Tr. |
Origin Energy | 5 | 3 | 1 | 1 | 10 | 0 | XHTML 1.0 strict |
Jetstar | 0 | 5 | 5 | 0 | 10 | 66 | HTML 4.01 Tr. |
AGIMO | 0 | 5 | 1 | 3 | 9 | 21 | XHTML 1.0 Tr. |
Qantas | 0 | 5 | 1 | 2 | 8 | 39 | HTML 4.0 Tr. |
vic.gov.au | 3 | 0 | 2 | 3 | 8 | 4 | HTML 4.01 Tr. |
Centrelink | 0 | 4 | 2 | 2 | 8 | 11 | HTML 4.0 Tr. |
Tabcorp | 2 | 5 | 1 | 0 | 8 | 27 | HTML 4.01 Tr. |
Commonwealth Bank | 0 | 3 | 4 | 0 | 7 | 15 | XHTML 1.0 Tr. |
Virgin Blue | 0 | 4 | 3 | 0 | 7 | 36 | XHTML 1.0 Tr. |
Fosters Group | 2 | 2 | 0 | 3 | 7 | 23 | HTML 4.01 Tr. |
Santos | 0 | 5 | 1 | 1 | 7 | 50 | HTML 4.0 Tr. |
Telstra | 0 | 3 | 2 | 2 | 7 | 62 | XHTML 1.0 Tr. |
Wotif | 0 | 3 | 2 | 2 | 7 | 24 | XHTML 1.0 Tr. |
Woolworths | 0 | 5 | 0 | 1 | 6 | 171 | HTML 4.0 Tr. |
Sensis | 0 | 4 | 1 | 1 | 6 | 44 | HTML 4.01 Tr. |
smh.com.au | 0 | 4 | 0 | 2 | 6 | 264 | XHTML 1.0 Tr. |
Macquarie Bank | 0 | 5 | 1 | 0 | 6 | 66 | no doctype |
Coles Myer | 0 | 5 | 1 | 0 | 6 | 181 | no doctype |
St George Bank | 1 | 5 | 0 | 0 | 6 | 27 | XHTML 1.0 Tr. |
3 Mobile | 1 | 4 | 1 | 0 | 6 | 55 | HTML 4.01 |
CSL | 1 | 5 | 0 | 0 | 6 | 9 | HTML 4.01 Tr. |
Alumina | 0 | 5 | 0 | 0 | 5 | 219 | no doctype |
CSR | 0 | 4 | 1 | 0 | 5 | 41 | no doctype |
Macquarie Infrastructure | 0 | 2 | 2 | 1 | 5 | 128 | XHTML 1.0 Tr. |
Boral | 0 | 4 | 0 | 1 | 5 | 221 | XHTML 1.0 Tr. |
AGL | 0 | 4 | 1 | 0 | 5 | 31 | no doctype |
Foxtel | 0 | 4 | 1 | 0 | 5 | 76 | no doctype |
Westfield | 0 | 2 | 1 | 2 | 5 | 74 | XHTML 1.0 Tr. |
Suncorp | 0 | 4 | 1 | 0 | 5 | 52 | HTML 4.01 Strict |
Bigpond | 0 | 5 | 0 | 0 | 5 | 229 | XHTML 1.0 Tr. |
AMP | 0 | 4 | 1 | 0 | 5 | 46 | no doctype |
ING Direct | 0 | 1 | 2 | 2 | 5 | 15 | HTML 4.01 Tr. |
Optus | 0 | 3 | 1 | 0 | 4 | 52 | HTML 4.01 Tr. |
SBS | 0 | 3 | 0 | 1 | 4 | 539 | HTML 4.0 Tr. |
flight center | 0 | 2 | 1 | 1 | 4 | 47 | XHTML 1.1 |
Tradingpost | 0 | 3 | 1 | 0 | 4 | 74 | no doctype |
ANZ | 0 | 4 | 0 | 0 | 4 | 67 | no doctype |
Channel 10 | 0 | 3 | 1 | 0 | 4 | 17 | HTML 4.0 Tr. |
Coca Cola Amatil | 0 | 2 | 1 | 0 | 3 | 21 | no doctype |
greengrocer.com.au | 0 | 3 | 0 | 0 | 3 | 85 | no doctype |
Fletcher Building | 0 | 1 | 0 | 2 | 3 | 267 | HTML 4.0 Tr. |
Promina | 0 | 2 | 0 | 1 | 3 | 31 | no doctype |
Westpac | 0 | 2 | 0 | 1 | 3 | 189 | no doctype |
Caltex | 0 | 1 | 1 | 1 | 3 | 49 | HTML 3.2 |
NRL | 0 | 2 | 1 | 0 | 3 | 191 | no doctype |
ATO | 0 | 1 | 0 | 1 | 2 | 101 | no doctype |
baggygreen.com.au | 0 | 2 | 0 | 0 | 2 | 151 | no doctype |
AXA | 0 | 1 | 0 | 1 | 2 | 144 | no doctype |
Orica | 0 | 1 | 0 | 1 | 2 | 18 | HTML 4.0 |
Greater Union | 0 | 2 | 0 | 0 | 2 | 339 | no doctype |
shopfast.com.au | 0 | 2 | 0 | 0 | 2 | 33 | no doctype |
whitepages.com.au | 0 | 1 | 0 | 1 | 2 | 155 | no doctype |
Paperlinx | 0 | 0 | 2 | 0 | 2 | 77 | HTML 4.01 Tr. |
Burns Philp | 0 | 0 | 0 | 2 | 2 | 24 | no doctype |
AFL | 0 | 2 | 0 | 0 | 2 | 315 | HTML 4.0 Tr. |
IAG | 0 | 0 | 2 | 0 | 2 | 22 | no doctype |
QBE insurance | 0 | 1 | 1 | 0 | 2 | 21 | no doctype |
Lendlease | 0 | 1 | 0 | 0 | 1 | 147 | no doctype |
Channel Seven | 0 | 0 | 1 | 0 | 1 | 21 | no doctype |
Bluescope Steel | 0 | 1 | 0 | 0 | 1 | 67 | no doctype |
PBL | 0 | 0 | 0 | 1 | 1 | 37 | no doctype |
Woodside Petroleum | 0 | 1 | 0 | 0 | 1 | 103 | no doctype |
General Property Trust | 0 | 1 | 0 | 0 | 1 | 142 | no doctype |
Amcor | 0 | 1 | 0 | 0 | 1 | 41 | HTML 4.0 Tr. |
ozemail | 0 | 1 | 0 | 0 | 1 | 110 | XHTML 1.0 Tr. |
Wesfarmers | 0 | 1 | 0 | 0 | 1 | 74 | no doctype |
news.com.au | 0 | 0 | 0 | 0 | 0 | 149 | HTML 4.01 Tr. |
yahoo.com.au | 0 | 0 | 0 | 0 | 0 | 65 | HTML 4.01 Tr. |
Ticketek | 0 | 0 | 0 | 0 | 0 | 30 | HTML 4.0 Tr. |
9msn | 0 | 0 | 0 | 0 | 0 | 44 | no doctype |
Breakdown by sector
Organisation | Score/20 |
---|---|
Financial Services | |
NAB | 13 |
Commonwealth | 7 |
Macquarie Bank | 6 |
St George | 6 |
suncorp | 5 |
AMP | 5 |
ING | 5 |
ANZ | 4 |
Promina | 3 |
Westpac | 3 |
AXA | 2 |
Orica | 2 |
IAG | 2 |
QBE | 2 |
Telcos | |
Telstra | 7 |
3Mobile | 6 |
Optus | 4 |
Media | |
ABC | 16 |
smh.com.au | 6 |
foxtel | 5 |
SBS | 4 |
Trading Post | 4 |
Channel 10 | 4 |
Greater Union | 2 |
Channel 7 | 1 |
News Limited | 0 |
Online | |
Yellow Pages | 15 |
mycareer | 11 |
whereis | 11 |
wotif | 7 |
sensis | 6 |
bigpond | 5 |
whitepages.com.au | 2 |
ozemail | 1 |
yahoo.com.au | 0 |
9MSN | 0 |
Retail | |
Tabcorp | 8 |
Woolworths | 6 |
Coles myer | 6 |
Westfield | 5 |
Greengrocer | 3 |
shopfast.com.au | 2 |
Ticketek | 0 |
Energy Mining | |
BHP Billiton | 18 |
Rio Tinto | 15 |
Origin Energy | 10 |
AGL | 5 |
Caltex | 3 |
Woodside | 1 |
Government | |
australia.gov.au | 19 |
Bureau of Meterology | 15 |
qld.gov.au | 12 |
nsw.gov.au | 11 |
AGIMO | 9 |
vic.gov.au | 8 |
Centrelink | 8 |
ATO | 2 |
Travel | |
Jetstar | 10 |
Qantas | 8 |
Virgin | 7 |
Flight Centre | 4 |
Industrial | |
Brambles | 16 |
Fosters | 7 |
Santos | 7 |
CSL | 6 |
Alumina | 5 |
CSR | 5 |
Macquarie Infrastructure | 5 |
Boral | 5 |
Coca Cola Amatil | 3 |
Fletcher Building | 3 |
Paperlinx | 2 |
Burns Philp | 2 |
Lendlease | 1 |
Bluescope | 1 |
PBL | 1 |
General Property Trust | 1 |
Amcor | 1 |
Wesfarmers | 1 |
Sport | |
NRL | 3 |
baggygreen.com.au | 2 |
AFL | 2 |
Sector Results
Sector | Count | Average | Median |
---|---|---|---|
Financial | 14 | 4.6 | 4 |
Telcos | 3 | 5.6 | 6 |
Media | 9 | 4.6 | 4 |
Online | 10 | 5.8 | 5 |
Retail | 7 | 4.3 | 5 |
Energy | 6 | 8.7 | 5 |
Government | 8 | 9.4 | 9 |
Industrial | 18 | 4 | 3 |
Sport | 3 | 2.3 | 2 |
Similar Work
Miles Burke at Port 80 has done a similar study of Western Austrlain Sites. His results are here
Roger Johansson at 456 Berea Street reports on two surveys of public sector sites and HTML validity. One of Swedish sites (in Swedish), and one of U.S. public sector sites
Discuss this?
If you have any observations, notes, criciticsms, please feel free to discuss this at my blog
John Allsopp is a director at westciv and the lead developer of Style Master CSS editor. He writes widely on web standards and software development issues and maintains the blog dog or higher.