a more meaningful web with microformats
John Allsopp
Western Civilisation pty. ltd :: westciv
My interest in microformats
More rich semantic vocabularies for XHTML
- specifying web patterns as used by developers
- bringing semantically rich SGML/XML content to the web while maintaining semantics in a useful way
In the spirit of "the microformats way"
- reuse of ideas, schemas, current practices
- extension of existing technologies
microformatting SGML and XML applications
We'll concentrate on the second project today and
Consider whether microformats do in fact provide an appropriate mechanism for this project
Consider how this process may occur, what challenges exist to this approach
This is a reasonably "theoretical" presentation, apologies if that is not what you expected.
Why microformats is interesting
- a framework for expressing rich XHTML semantics in a formalized way (XMDP)
- a mechanism for proposing solutions - .1 Microformats Technorati developer wiki
- the support of an organization - Technorati, and many very smart people
- traction in the market place
What's the problem?
- loads of semantically rich complex information - libraries, organizations
- largely hidden from the web, its users, from search engines, and other software, even most browsers
- conversion is usually ad hoc, loses semantic content, and other potential benefits, such as the ability to reuse style sheetz for specific purposes (for instance accessibility)
Is it really a big deal?
Before we continue, is it that big a deal?
Afterall, we can convert sgml/xml to html, latex, pdf with dssl, xsl, why bother?
Some reasons for bothering
- standardization of data formats on the web
- reuse/standardization of style sheets and presentation
- potential for more interesting search applications - a la XHTML Friends
What do we need to solve the problem?
We need a standardized way to map content marked up as SGML and XML applications (e.g. docbook) to valid HTML/XHTML, maintaining as much as possible their semantics.
How might we do this?
1. Modularization of XHTML - embedding non XHTML based XML in XHTML documents
This is far from ideal because
- SGML not supported, only good for applications of xml
- Support largely non existent in browsers
- little machine benefit (Google, PHP apps etc) as most of these are XHTML/HTML oriented
- It is not what developers do, and is unlikely to be so any time soon
How might we do this?
2. Microformats
Challenges for this approach
- more "ambitious" than microformats
- perhaps not entirely in comformance with microformats principles
MF specification says of Microformats that they should
- be designed for humans first, machines second" (this project is probably more machine than human focussed)
- use the most accurately precise semantic X(HT)ML building block for each object etc - this is where we will have some fun
How microformats are right for this project
The microformats specification also says they should
- reuse existing microformats and well established schemas (e.g. IETF RFCs) as building blocks
- [be] design[ed] to be reused and embedded inside existing formats and microformats
- enable and encourage decentralized development and services
- build on top of X(HT)ML
- reuse the schema (names, objects, properties, values, types, hierarchies, constraints) as much as possible from pre-existing, established, well-supported standards by reference.
How might we do this?
3. A "third way" that is like microformats, but more ambitious
given what we said before, this is not ideal, as microformats already
- provide a framework for expressing rich XHTML semantics in a formalized way (XMDP)
- provide a mechanism for proposing solutions - .1 Microformats
- have the support of an organization - technorati, and many very smart people
- have traction in the market place
Microformatting an application of XML/SGML
Some of the issues and challenges in mapping an application of XML or SGML onto XHTML using microformats.
Use some simple examples from Docbook.
Docbook is like "HTML on steroids".
This very similarity poses some interesting issues.
Outline of a strategy for mapping to XHTML
- take an application of SGML or XML e.g. Docbook
- identify the elements in that application which need to be mapped into XHTML (this may be all or a subset) (we'll discuss attributes later, they present challenges)
- decide on the XHTML attributes or elements use to map these elements onto
- create an XMDP profile
- Create XSLT/DSSL/... to automatically transform existing documents
- ????
- Profit
The devil is in the details
How to map elements to XHTML
Possibility #1. always map docbook element names to class attribute values in HTML
benefits of this are that microformat can be machine generated from doctypes
e.g. <para> elements map to <div class="para">
- inline elements map to span elements
- block to div
- list items to li
The devil is in the details
How to map elements to XHTML
Possibility #2. where the mapping is "close" map to an element
e.g. <para> maps to p, <simpara> and <formalpara> to <p class="simpara"> and <p class="formalpara">
While this is more "semantic", it also requires more complex decision making, smarter transformations (for strategy 1. the same transformation process works for most if not all applications of XML and SGML)
How to map elements to XHTML
A particular challenge is lists. Docbook (and other such applications) may feature one or more kind of lists. Docbook features lists, glosslists, ordered lists, segmented lists.
Should these all be mapped to a generic list (ul) or where appropriate to ordered and definition lists as well?
How to map elements to XHTML
Which of these approaches is more in conformance with microformats?
Here is what the microformat specification says that is relevant:
- microformats reuse the schema (names, objects, properties, values, types, hierarchies, constraints) as much as possible from pre-existing, established, well-supported standards by reference.
- microformats use the most accurately precise semantic X(HT)ML building block for each object etc.
- otherwise microformats use a generic structural element (e.g. <span> or <div>), or the appropriate contextual element (e.g. an <li> inside a <ul> or <ol>).
How to map elements to XHTML
This suggests the second approach is more in conformance with microformat principles, but does make for a slightly more complex process
Which ever approach is chosen, DSSL or XSLT could be used to do the actual transformations
Issues and difficulties
What happens to attributes?
Where an element maps to a generic element with a class value how can we also map any attributes and their values?
How important is this?
Is the loss of attribute values a significant or even fatal problem?
Issues and difficulties
Collisions
If multiple namespaces are used in the transformed document, then collisions might occur.
Questions for discussion
Is this a problem that needs solving?
Are microformats really the right way to solve this?
How should the transformations be made - to generic elements and class values, or to elements where possible?
Who would make and how would mapping choices be made?