Chapter 18. Using Microformats and RDFa As Embeddable Data Formats

Table of Contents

Using Operator to Learn About Microformats
adr (Addresses)
hCard (Contacts)
hCalendar (Events)
geo (Locations)
tag (Tagspaces)
Definitions and Design Goals of Microformats
Microformats Design Patterns
Examples of Microformats
hCard and adr
Other Microformats
Microformats in Practice
Programming with Microformats
Language-Specific Libraries
Writing an Operator Script
Studying the Tutorial Script
Writing a Geocoding Script
Resources (RDFa): A Promising Complement to Microformats
Reference for Further Study

The central problem that we will study in this chapter is how to embed information in web pages in a way that is easy to understand by both humans and computer programs. The solution that we will consider in depth is microformats, little chunks of structured data that are seamlessly embedded in web pages. (X)HTML is designed primarily to produce a human user interface (via a web browser). However, by carefully following certain conventions (ones that constitute microformats), you can produce (X)HTML out of which data can be unambiguously extracted. The consequence is that it is relatively easy to write computer programs to parse microformats so that the data can be reused in other contexts—giving rise to plenty of mashup possibilities. Moreover, having data embedded right in the context of a user interface is helpful. A user can decide what to do with this data (how to “operate” on a given piece of data) while in the context of normal browsing.

Now, the previous paragraph is a bit abstract. What I will do in this chapter is walk you through some concrete examples describing microformats in general. We will use Operator, a Firefox extension, to help parse and view microformats and to create scripts that enable users to take specific actions in response to microformats. Specifically, we will do the following:

Using Operator to Learn About Microformats

Installing the Operator ­add-­on in Firefox and seeing it in action is a good way to learn about microformats. You can download it from here:

As of writing, the latest stable version is 0.8, the one I will use and describe in this chapter. (Note that there is also a 0.9 beta version.[305]

When using Operator, you should have on hand the closest thing to official documentation for the extension:

Now let’s see what Operator can do for you once it is installed. Let’s look at Operator in action by loading a page from (the event aggregation site you learned about in Chapter 15):

Figure 18-1 shows what happens in the Operator toolbar. I’ve chosen this page to show you an example of microformats in use “in the wild.” (Later in this chapter, I’ll show you an example HTML page I created to show examples of microformats.)

You will notice in the Operator toolbar a list of data formats recognized by Operator, along with the number of instances of each format. By default, these formats (listed by their descriptive and formal names) are as follows:

By default, the Operator toolbar uses the descriptive names. You can instead display the formal names of the data formats in the Operator toolbar (by doing to the General tab and unchecking Use Descriptive Names under Data Formats). Toggling that option allows you to correlate the formal and descriptive names of the data formats.

I will cover the individual data formats in detail later in the chapter. Continuing with the example from, note that Operator indicates the presence of instances for the following formats: adr, hCard, hCalendar, geo, and tag. What do these microformats have to do with the event in question (Mashup Camp IV)? The UI for gives you many options to package information about the event:

  • You can send it to a number of calendars.

  • You can download the event information in the iCalendar format.

  • You can use the API (as explained in Chapter 15) to extract the event information from

For instance, examine the event in iCalendar format, which you can access from

         X-WR-CALNAME:Upcoming Event: Mashup Camp IV
         PRODID:-// ICS//EN
         SUMMARY:Mashup Camp IV
         DESCRIPTION: [Full details at ] From Mass
         Events Labs\, the organizers of the wildly successfully Mashup Camp unconferences\,
         comes Mashup Camp IV.  Back on the West Coast (the Computer History Museum in
         Mountain View\, CA)\, with the same great people\,  great conversations and
         discussions.  Same fun\, hacking\, and networking in an Open Space format.
               Have a mashup you'd like to show off?  Enter it in the Best Mashup contest and
         see if you can survive the grueling SpeedGeeking session.  Event submitted by on behalf of chris_radcliff .
         LOCATION;VENUE-UID="":Computer History Museum @
         1401 N Shoreline Blvd.\, Mountain View\, California 94043 US
         NAME:Computer History Museum
         ADDRESS:1401 N Shoreline Blvd.
         CITY:Mountain View
         COUNTRY;X-ABBREV=us:United States
         URL;X-LABEL=Venue Info:

As the Operator toolbar indicates, the event information is also embedded in the (X)HTML source at the following location as a series of microformats:

Let’s take a look at each of these example microformats in turn. I’ll give a more formal discussion of each one in the following sections.


 You can use Operator to help in this exercise by checking the Debug Mode option (on the General tab) in Operator so that you have access to the Debug action for each microformat instance. The Debug action lists the (X)HTML source fragment containing the microformat instance.

adr (Addresses)

From the web page, you can read the address for the event: 1401 N Shoreline Blvd., Mountain View, California, 94043. Operator picks out the address as an instance of the adr data format, with the corresponding (X)HTML source fragment:

            <div class="address adr">
              <span class="street-address">1401 N Shoreline Blvd.</span><br />
              <span class="locality">Mountain View</span>,
              <span class="region">California</span> <span class="postal-code">94043</span>

Note the use of the <div> tag to wrap the address and class attributes to separate and name the parts of the address. This (X)HTML fragment meets two goals simultaneously: it displays an address naturally and appropriately for a human reader of the web page, and it uses (X)HTML elements and attributes to enable programs (such as Operator) to reliably parse an address from the (X)HTML. You will see this design goal of satisfying human and computer readers repeated among all the microformats.

With the adr microformat parsed out, you as a user can then apply an action to the address. Operator has by default two actions (in addition to Debug) that you can apply to an address: Find with Google Maps and Find with Yahoo! Maps. Selecting the first action, for instance, loads the following into the browser:


This action, in effect, enables Operator to perform a mashup of and Google Maps—and more generally, any web site that has adr microformat data with Google Maps. Note also how Operator enables the user to invoke this action in the context of web browsing. Firefox with Operator joins a web site with an adr microformat to Google Maps—and not a ­third-­party web application.

Operator allows you to add other actions. Later in the chapter, I will show you how to add other user scripts to Operator and to write a basic user script to geocode addresses.

hCard (Contacts)

The hCard data format is meant to represent a person or organization, specifically contact information for the entity. The (X)HTML source for the embedded hCard microformat is as follows:

            <div class="venue location vcard">
              <span class="fn org">
                <a href="/venue/259/">Computer History Museum</a>
              <br />
              <div class="address adr">
                <span class="street-address">1401 N Shoreline Blvd.</span><br />
                <span class="locality">Mountain View</span>,
                <span class="region">California</span>
                <span class="postal-code">94043</span>
              <span class="geo" style="display: none">
                <span class="latitude">37.4149</span>,
                <span class="longitude">-122.078</span>

You might be wondering why you will see vcard (instead of hcard) as a class attribute. The reason is that hCard is derived from the vCard standard. You can compare the vCard data that Operator creates for this page to the (X)HTML source to see the similarities:

            PRODID:-// 0.8//EN
            NAME:Mashup Camp IV at Computer History Museum (Wednesday, July 18, 2007) - Upcoming
            ORG;CHARSET=UTF-8:Computer History Museum
            FN;CHARSET=UTF-8:Computer History Museum
            ADR;CHARSET=UTF-8:;;1401 N Shoreline Blvd.;Mountain View;California;94043;

Among the default actions in Operator for hCard is Add to Yahoo! Contacts, which, when invoked for this page, loads the following URL into the browser:


hCalendar (Events)

The hCalendar microformat represents events and is roughly speaking the iCalendar format transformed into a microformat. (See Chapter 15 for a discussion of iCalendar.) The (X)HTML source for the hCalendar microformat is a large fragment that I will not quote here. To find it, you can use Operator or look at the source and find a <div> element that begins with this:

<div id="calendarContainer" class="vcalendar"> <!-- Begin vCalendar -->

and ends lines later with this:

</div> <!-- End vCalendar -->

The pieces of (X)HTML in between contain event data, such as this:

            <abbr class="dtstart" title="20070718T130000">Wednesday, July 18, 2007

and the following:

<abbr class="dtend" title="20070719T140000">

As in the case of hCard, you might wonder why the hCalendar format would use class="vcalendar" and not class="hcalendar". vCalendar was the precursor to iCalendar, a fact that is reflected in the iCalendar standard (which if you look at the iCalendar for the event listed earlier), you have the following structure:


Among the default actions associated with hCalendar are ones to send the event data to Google Calendar, Yahoo! Calendar, and Compare how you moved event data with APIs in Chapter 15 with this approach of extracting microformat data and sending that data to other services via an HTTP GET request.

geo (Locations)

The geo data format represents a geospatial location, specifically a latitude and longitude. The (X)HTML source for the geo instance is as follows:

            <span class="geo" style="display: none">
              <span class="latitude">37.4149</span>,
              <span class="longitude">-122.078</span>

With Operator, you can map this location to Google Maps and Yahoo! Maps, or you can export it as KML.

tag (Tagspaces) supports the tagging of individual events. For instance, among the tags for the example event is mashup. You can find this tag marked up using the tag microformat in the (X)HTML source. For example:

<a href="/tag/mashup/" rel="tag" class="category">mashup</a>

You’ll see from the following discussion that the combination of rel=tag in an <a> element is indicative of a tag microformat and that the last path component of the URL (that is, mashup) is the text of the tag. By default, there are actions in Operator to look this tag up in such web sites as, Flickr,, and YouTube.

[305] See the blog entry announcing9b ( You can download the latest develop version of Operator from

[306] The format called tag in Operator is known as rel-tag on