Chapter 18. Using Microformats and RDFa As Embeddable Data Formats

Table of Contents

Using Operator to Learn About Microformats
adr (Addresses)
hCard (Contacts)
hCalendar (Events)
geo (Locations)
tag (Tagspaces)
Definitions and Design Goals of Microformats
Microformats Design Patterns
rel-design-pattern
class-design-pattern
abbr-design-pattern
include-pattern
Examples of Microformats
rel-license
rel-tag
xfn
xFolk
geo
hCard and adr
hCalendar
Other Microformats
Microformats in Practice
Programming with Microformats
Language-Specific Libraries
Writing an Operator Script
Studying the Tutorial Script
Writing a Geocoding Script
Resources (RDFa): A Promising Complement to Microformats
Reference for Further Study
Summary

The central problem that we will study in this chapter is how to embed information in web pages in a way that is easy to understand by both humans and computer programs. The solution that we will consider in depth is microformats, little chunks of structured data that are seamlessly embedded in web pages. (X)HTML is designed primarily to produce a human user interface (via a web browser). However, by carefully following certain conventions (ones that constitute microformats), you can produce (X)HTML out of which data can be unambiguously extracted. The consequence is that it is relatively easy to write computer programs to parse microformats so that the data can be reused in other contexts—giving rise to plenty of mashup possibilities. Moreover, having data embedded right in the context of a user interface is helpful. A user can decide what to do with this data (how to “operate” on a given piece of data) while in the context of normal browsing.

Now, the previous paragraph is a bit abstract. What I will do in this chapter is walk you through some concrete examples describing microformats in general. We will use Operator, a Firefox extension, to help parse and view microformats and to create scripts that enable users to take specific actions in response to microformats. Specifically, we will do the following:

Using Operator to Learn About Microformats

Installing the Operator ­add-­on in Firefox and seeing it in action is a good way to learn about microformats. You can download it from here:

https://addons.mozilla.org/en-US/firefox/addon/4106

As of writing, the latest stable version is 0.8, the one I will use and describe in this chapter. (Note that there is also a 0.9 beta version.[305]

When using Operator, you should have on hand the closest thing to official documentation for the extension:

http://www.kaply.com/weblog/operator/

Now let’s see what Operator can do for you once it is installed. Let’s look at Operator in action by loading a page from Upcoming.yahoo.com (the event aggregation site you learned about in Chapter 15):

http://upcoming.yahoo.com/event/144855

Figure 18-1 shows what happens in the Operator toolbar. I’ve chosen this page to show you an example of microformats in use “in the wild.” (Later in this chapter, I’ll show you an example HTML page I created to show examples of microformats.)


You will notice in the Operator toolbar a list of data formats recognized by Operator, along with the number of instances of each format. By default, these formats (listed by their descriptive and formal names) are as follows:

By default, the Operator toolbar uses the descriptive names. You can instead display the formal names of the data formats in the Operator toolbar (by doing to the General tab and unchecking Use Descriptive Names under Data Formats). Toggling that option allows you to correlate the formal and descriptive names of the data formats.

I will cover the individual data formats in detail later in the chapter. Continuing with the example from Upcoming.yahoo.com, note that Operator indicates the presence of instances for the following formats: adr, hCard, hCalendar, geo, and tag. What do these microformats have to do with the event in question (Mashup Camp IV)? The UI for Upcoming.yahoo.com gives you many options to package information about the event:

  • You can send it to a number of calendars.

  • You can download the event information in the iCalendar format.

  • You can use the Upcoming.yahoo.com API (as explained in Chapter 15) to extract the event information from Upcoming.yahoo.com.

For instance, examine the event in iCalendar format, which you can access from http://upcoming.yahoo.com/calendar/v2/event/144855:

         BEGIN:VCALENDAR
         VERSION:2.0
         X-WR-CALNAME:Upcoming Event: Mashup Camp IV
         PRODID:-//Upcoming.org/Upcoming ICS//EN
         CALSCALE:GREGORIAN
         METHOD:PUBLISH
         BEGIN:VEVENT
         DTSTART:20070718T130000
         DTEND:20070718T140000
         RRULE:FREQ=DAILY;INTERVAL=1;UNTIL=20070720T000000
         TRANSP:TRANSPARENT
         SUMMARY:Mashup Camp IV
         DESCRIPTION: [Full details at http://upcoming.yahoo.com/event/144855/ ] From Mass
         Events Labs\, the organizers of the wildly successfully Mashup Camp unconferences\,
         comes Mashup Camp IV.  Back on the West Coast (the Computer History Museum in
         Mountain View\, CA)\, with the same great people\,  great conversations and
         discussions.  Same fun\, hacking\, and networking in an Open Space format.
               Have a mashup you'd like to show off?  Enter it in the Best Mashup contest and
         see if you can survive the grueling SpeedGeeking session.  Event submitted by
         Eventful.com on behalf of chris_radcliff .
         URL;VALUE=URI:http://upcoming.yahoo.com/event/144855/
         UID:http://upcoming.yahoo.com/event/144855/
         DTSTAMP:20070125T124529
         LAST-UPDATED:20070125T124529
         CATEGORIES:Other
         ORGANIZER;CN=chris_radcliff:X-ADDR:http://upcoming.yahoo.com/user/19139/
         LOCATION;VENUE-UID="http://upcoming.yahoo.com/venue/259/":Computer History Museum @
         1401 N Shoreline Blvd.\, Mountain View\, California 94043 US
         END:VEVENT
         BEGIN:VVENUE
         X-VVENUE-INFO:http://evdb.com/docs/ical-venue/draft-norris-ical-venue.html
         NAME:Computer History Museum
         ADDRESS:1401 N Shoreline Blvd.
         CITY:Mountain View
         REGION;X-ABBREV=ca:California
         COUNTRY;X-ABBREV=us:United States
         POSTALCODE:94043
         GEO:37.4149;-122.078
         URL;X-LABEL=Venue Info:http://www.computerhistory.org/
         END:VVENUE
         END:VCALENDAR
      

As the Operator toolbar indicates, the event information is also embedded in the (X)HTML source at the following location as a series of microformats:

http://upcoming.yahoo.com/event/144855

Let’s take a look at each of these example microformats in turn. I’ll give a more formal discussion of each one in the following sections.

[Tip]Tip

 You can use Operator to help in this exercise by checking the Debug Mode option (on the General tab) in Operator so that you have access to the Debug action for each microformat instance. The Debug action lists the (X)HTML source fragment containing the microformat instance.

adr (Addresses)

From the web page, you can read the address for the event: 1401 N Shoreline Blvd., Mountain View, California, 94043. Operator picks out the address as an instance of the adr data format, with the corresponding (X)HTML source fragment:

            <div class="address adr">
              <span class="street-address">1401 N Shoreline Blvd.</span><br />
              <span class="locality">Mountain View</span>,
              <span class="region">California</span> <span class="postal-code">94043</span>
            </div>
         

Note the use of the <div> tag to wrap the address and class attributes to separate and name the parts of the address. This (X)HTML fragment meets two goals simultaneously: it displays an address naturally and appropriately for a human reader of the web page, and it uses (X)HTML elements and attributes to enable programs (such as Operator) to reliably parse an address from the (X)HTML. You will see this design goal of satisfying human and computer readers repeated among all the microformats.

With the adr microformat parsed out, you as a user can then apply an action to the address. Operator has by default two actions (in addition to Debug) that you can apply to an address: Find with Google Maps and Find with Yahoo! Maps. Selecting the first action, for instance, loads the following into the browser:

            http://maps.google.com/maps?q=1401%20N%20Shoreline%20Blvd.,%20California,%20MountainÂ
            %20View,%2094043
         

This action, in effect, enables Operator to perform a mashup of Upcoming.yahoo.com and Google Maps—and more generally, any web site that has adr microformat data with Google Maps. Note also how Operator enables the user to invoke this action in the context of web browsing. Firefox with Operator joins a web site with an adr microformat to Google Maps—and not a ­third-­party web application.

Operator allows you to add other actions. Later in the chapter, I will show you how to add other user scripts to Operator and to write a basic user script to geocode addresses.

hCard (Contacts)

The hCard data format is meant to represent a person or organization, specifically contact information for the entity. The (X)HTML source for the embedded hCard microformat is as follows:

            <div class="venue location vcard">
              <span class="fn org">
                <a href="/venue/259/">Computer History Museum</a>
              </span>
              <br />
              <div class="address adr">
                <span class="street-address">1401 N Shoreline Blvd.</span><br />
                <span class="locality">Mountain View</span>,
                <span class="region">California</span>
                <span class="postal-code">94043</span>
              </div>
              <span class="geo" style="display: none">
                <span class="latitude">37.4149</span>,
                <span class="longitude">-122.078</span>
              </span>
            </div>
         

You might be wondering why you will see vcard (instead of hcard) as a class attribute. The reason is that hCard is derived from the vCard standard. You can compare the vCard data that Operator creates for this page to the (X)HTML source to see the similarities:

            BEGIN:VCARD
            PRODID:-//kaply.com//Operator 0.8//EN
            SOURCE:http://upcoming.yahoo.com/event/144855
            NAME:Mashup Camp IV at Computer History Museum (Wednesday, July 18, 2007) - Upcoming
            VERSION:3.0
            N:;;;;
            ORG;CHARSET=UTF-8:Computer History Museum
            FN;CHARSET=UTF-8:Computer History Museum
            UID:
            ADR;CHARSET=UTF-8:;;1401 N Shoreline Blvd.;Mountain View;California;94043;
            GEO:37.4149;-122.078
            END:VCARD
         

Among the default actions in Operator for hCard is Add to Yahoo! Contacts, which, when invoked for this page, loads the following URL into the browser:

            http://address.yahoo.com/?fn=Computer%20History%20Museum&co=Computer%20History%20MusÂ
            eum&ha1=1401%20N%20Shoreline%20Blvd.&hc=Mountain%20View&hs=California&hz=94043&A=C
         

hCalendar (Events)

The hCalendar microformat represents events and is roughly speaking the iCalendar format transformed into a microformat. (See Chapter 15 for a discussion of iCalendar.) The (X)HTML source for the hCalendar microformat is a large fragment that I will not quote here. To find it, you can use Operator or look at the source and find a <div> element that begins with this:

<div id="calendarContainer" class="vcalendar"> <!-- Begin vCalendar -->

and ends lines later with this:

</div> <!-- End vCalendar -->

The pieces of (X)HTML in between contain event data, such as this:

            <abbr class="dtstart" title="20070718T130000">Wednesday, July 18, 2007
            </abbr>
         

and the following:

<abbr class="dtend" title="20070719T140000">

As in the case of hCard, you might wonder why the hCalendar format would use class="vcalendar" and not class="hcalendar". vCalendar was the precursor to iCalendar, a fact that is reflected in the iCalendar standard (which if you look at the iCalendar for the Upcoming.yahoo.com event listed earlier), you have the following structure:

            BEGIN:VCALENDAR
            [...]
            DTSTART:20070718T130000
            DTEND:20070718T140000
            [...]
            END:VCALENDAR
         

Among the default actions associated with hCalendar are ones to send the event data to Google Calendar, Yahoo! Calendar, and 30boxes.com. Compare how you moved event data with APIs in Chapter 15 with this approach of extracting microformat data and sending that data to other services via an HTTP GET request.

geo (Locations)

The geo data format represents a geospatial location, specifically a latitude and longitude. The (X)HTML source for the geo instance is as follows:

            <span class="geo" style="display: none">
              <span class="latitude">37.4149</span>,
              <span class="longitude">-122.078</span>
            </span>
         

With Operator, you can map this location to Google Maps and Yahoo! Maps, or you can export it as KML.

tag (Tagspaces)

Upcoming.yahoo.com supports the tagging of individual events. For instance, among the tags for the example event is mashup. You can find this tag marked up using the tag microformat in the (X)HTML source. For example:

<a href="/tag/mashup/" rel="tag" class="category">mashup</a>

You’ll see from the following discussion that the combination of rel=tag in an <a> element is indicative of a tag microformat and that the last path component of the URL (that is, mashup) is the text of the tag. By default, there are actions in Operator to look this tag up in such web sites as del.icio.us, Flickr, Upcoming.yahoo.com, and YouTube.



[305] See the blog entry announcing9b (http://www.kaply.com/weblog/2007/12/03/?operator-09-beta-available/). You can download the latest develop version of Operator from http://www.kaply.com/operator/operator.xpi.

[306] The format called tag in Operator is known as rel-tag on http://microformats.org.