a more meaningful web with microformats

John Allsopp

Western Civilisation pty. ltd :: westciv

My interest in microformats

More rich semantic vocabularies for XHTML

In the spirit of "the microformats way"

microformatting SGML and XML applications

We'll concentrate on the second project today and

Why microformats is interesting

What's the problem?

Is it really a big deal?

Before we continue, is it that big a deal?

Afterall, we can convert sgml/xml to html, latex, pdf with dssl, xsl, why bother?

Some reasons for bothering

What do we need to solve the problem?

We need a standardized way to map content marked up as SGML and XML applications (e.g. docbook) to valid HTML/XHTML, maintaining as much as possible their semantics.

How might we do this?

1. Modularization of XHTML - embedding non XHTML based XML in XHTML documents

This is far from ideal because

How might we do this?

2. Microformats

Challenges for this approach

MF specification says of Microformats that they should

How microformats are right for this project

The microformats specification also says they should

How might we do this?

3. A "third way" that is like microformats, but more ambitious

given what we said before, this is not ideal, as microformats already

Microformatting an application of XML/SGML

Some of the issues and challenges in mapping an application of XML or SGML onto XHTML using microformats.

Use some simple examples from Docbook.

Docbook is like "HTML on steroids".

This very similarity poses some interesting issues.

Outline of a strategy for mapping to XHTML

  1. take an application of SGML or XML e.g. Docbook
  2. identify the elements in that application which need to be mapped into XHTML (this may be all or a subset) (we'll discuss attributes later, they present challenges)
  3. decide on the XHTML attributes or elements use to map these elements onto
  4. create an XMDP profile
  5. Create XSLT/DSSL/... to automatically transform existing documents
  6. ????
  7. Profit

The devil is in the details

How to map elements to XHTML

Possibility #1. always map docbook element names to class attribute values in HTML

benefits of this are that microformat can be machine generated from doctypes

e.g. <para> elements map to <div class="para">

The devil is in the details

How to map elements to XHTML

Possibility #2. where the mapping is "close" map to an element

e.g. <para> maps to p, <simpara> and <formalpara> to <p class="simpara"> and <p class="formalpara">

While this is more "semantic", it also requires more complex decision making, smarter transformations (for strategy 1. the same transformation process works for most if not all applications of XML and SGML)

How to map elements to XHTML

A particular challenge is lists. Docbook (and other such applications) may feature one or more kind of lists. Docbook features lists, glosslists, ordered lists, segmented lists.

Should these all be mapped to a generic list (ul) or where appropriate to ordered and definition lists as well?

How to map elements to XHTML

Which of these approaches is more in conformance with microformats?

Here is what the microformat specification says that is relevant:

How to map elements to XHTML

This suggests the second approach is more in conformance with microformat principles, but does make for a slightly more complex process

Which ever approach is chosen, DSSL or XSLT could be used to do the actual transformations

Issues and difficulties

What happens to attributes?

Where an element maps to a generic element with a class value how can we also map any attributes and their values?

How important is this?

Is the loss of attribute values a significant or even fatal problem?

Issues and difficulties

Collisions

If multiple namespaces are used in the transformed document, then collisions might occur.

Questions for discussion

Is this a problem that needs solving?

Are microformats really the right way to solve this?

How should the transformations be made - to generic elements and class values, or to elements where possible?

Who would make and how would mapping choices be made?