Table of Contents
The central problem that we will study in this chapter is how to embed information in web pages in a way that is easy to understand by both humans and computer programs. The solution that we will consider in depth is microformats, little chunks of structured data that are seamlessly embedded in web pages. (X)HTML is designed primarily to produce a human user interface (via a web browser). However, by carefully following certain conventions (ones that constitute microformats), you can produce (X)HTML out of which data can be unambiguously extracted. The consequence is that it is relatively easy to write computer programs to parse microformats so that the data can be reused in other contexts—giving rise to plenty of mashup possibilities. Moreover, having data embedded right in the context of a user interface is helpful. A user can decide what to do with this data (how to “operate” on a given piece of data) while in the context of normal browsing.
Now, the previous paragraph is a bit abstract. What I will do in this chapter is walk you through some concrete examples describing microformats in general. We will use Operator, a Firefox extension, to help parse and view microformats and to create scripts that enable users to take specific actions in response to microformats. Specifically, we will do the following:
Study specific examples of microformats
Look at how to use the Firefox add-on Operator to jump-start your study of microformats
Look at programmatic approaches to consuming and creating microformats
Compare microformats to leading alternatives such as RDFa as a way to embed data in human-readable contexts.
Installing the Operator add-on in Firefox and seeing it in action is a good way to learn about microformats. You can download it from here:
https://addons.mozilla.org/en-US/firefox/addon/4106
As of writing, the latest stable version is 0.8, the one I will use and describe in this chapter. (Note that there is also a 0.9 beta version.[305]
When using Operator, you should have on hand the closest thing to official documentation for the extension:
http://www.kaply.com/weblog/operator/
Now let’s see what Operator can do for you once it is installed. Let’s look at Operator in action by loading a page from Upcoming.yahoo.com (the event aggregation site you learned about in Chapter 15):
http://upcoming.yahoo.com/event/144855
Figure 18-1 shows what happens in the Operator toolbar. I’ve chosen this page to show you an example of microformats in use “in the wild.” (Later in this chapter, I’ll show you an example HTML page I created to show examples of microformats.)
Figure 18.1. Figure 18-1. Operator toolbar showing microformats embedded in a page from Upcoming.yahoo.com. Actions available for the location microformat are shown boxed. (Reproduced with permission of Yahoo! Inc. ® 2007 by Yahoo! Inc. YAHOO! and the YAHOO! logo are trademarks of Yahoo! Inc.)
You will notice in the Operator toolbar a list of data formats recognized by Operator, along with the number of instances of each format. By default, these formats (listed by their descriptive and formal names) are as follows:
Addresses (adr
)
• Contacts (hCard
)
Events (hCalendar
)
Location (geo
)
Tagspaces (tag
or rel-tag
)[306]
Bookmarks (xFolk
)
Resources (RDF
)
By default, the Operator toolbar uses the descriptive names. You can instead display the formal names of the data formats in the Operator toolbar (by doing to the General tab and unchecking Use Descriptive Names under Data Formats). Toggling that option allows you to correlate the formal and descriptive names of the data formats.
I will cover the individual data formats in detail later in the chapter. Continuing with the example from Upcoming.yahoo.com, note that Operator indicates the presence of instances for the following formats: adr
, hCard
, hCalendar
, geo
, and tag
. What do these microformats have to do with the event in question (Mashup Camp IV)? The UI for Upcoming.yahoo.com gives you many options to package information about the event:
You can send it to a number of calendars.
You can download the event information in the iCalendar format.
You can use the Upcoming.yahoo.com API (as explained in Chapter 15) to extract the event information from Upcoming.yahoo.com.
For instance, examine the event in iCalendar format, which you can access from http://upcoming.yahoo.com/calendar/v2/event/144855
:
BEGIN:VCALENDAR VERSION:2.0 X-WR-CALNAME:Upcoming Event: Mashup Camp IV PRODID:-//Upcoming.org/Upcoming ICS//EN CALSCALE:GREGORIAN METHOD:PUBLISH BEGIN:VEVENT DTSTART:20070718T130000 DTEND:20070718T140000 RRULE:FREQ=DAILY;INTERVAL=1;UNTIL=20070720T000000 TRANSP:TRANSPARENT SUMMARY:Mashup Camp IV DESCRIPTION: [Full details at http://upcoming.yahoo.com/event/144855/ ] From Mass Events Labs\, the organizers of the wildly successfully Mashup Camp unconferences\, comes Mashup Camp IV. Back on the West Coast (the Computer History Museum in Mountain View\, CA)\, with the same great people\, great conversations and discussions. Same fun\, hacking\, and networking in an Open Space format. Have a mashup you'd like to show off? Enter it in the Best Mashup contest and see if you can survive the grueling SpeedGeeking session. Event submitted by Eventful.com on behalf of chris_radcliff . URL;VALUE=URI:http://upcoming.yahoo.com/event/144855/ UID:http://upcoming.yahoo.com/event/144855/ DTSTAMP:20070125T124529 LAST-UPDATED:20070125T124529 CATEGORIES:Other ORGANIZER;CN=chris_radcliff:X-ADDR:http://upcoming.yahoo.com/user/19139/ LOCATION;VENUE-UID="http://upcoming.yahoo.com/venue/259/":Computer History Museum @ 1401 N Shoreline Blvd.\, Mountain View\, California 94043 US END:VEVENT BEGIN:VVENUE X-VVENUE-INFO:http://evdb.com/docs/ical-venue/draft-norris-ical-venue.html NAME:Computer History Museum ADDRESS:1401 N Shoreline Blvd. CITY:Mountain View REGION;X-ABBREV=ca:California COUNTRY;X-ABBREV=us:United States POSTALCODE:94043 GEO:37.4149;-122.078 URL;X-LABEL=Venue Info:http://www.computerhistory.org/ END:VVENUE END:VCALENDAR
As the Operator toolbar indicates, the event information is also embedded in the (X)HTML source at the following location as a series of microformats:
http://upcoming.yahoo.com/event/144855
Let’s take a look at each of these example microformats in turn. I’ll give a more formal discussion of each one in the following sections.
![]() | Tip |
---|---|
You can use Operator to help in this exercise by checking the Debug Mode option (on the General tab) in Operator so that you have access to the Debug action for each microformat instance. The Debug action lists the (X)HTML source fragment containing the microformat instance. |
From the web page, you can read the address for the event: 1401 N Shoreline Blvd., Mountain View, California, 94043. Operator picks out the address as an instance of the adr
data format, with the corresponding (X)HTML source fragment:
<div class="address adr"> <span class="street-address">1401 N Shoreline Blvd.</span><br /> <span class="locality">Mountain View</span>, <span class="region">California</span> <span class="postal-code">94043</span> </div>
Note the use of the <div>
tag to wrap the address
and class
attributes to separate and name the parts of the address. This (X)HTML fragment meets two goals simultaneously: it displays an address naturally and appropriately for a human reader of the web page, and it uses (X)HTML elements and attributes to enable programs (such as Operator) to reliably parse an address from the (X)HTML. You will see this design goal of satisfying human and computer readers repeated among all the microformats.
With the adr
microformat parsed out, you as a user can then apply an action to the address. Operator has by default two actions (in addition to Debug) that you can apply to an address: Find with Google Maps and Find with Yahoo! Maps. Selecting the first action, for instance, loads the following into the browser:
http://maps.google.com/maps?q=1401%20N%20Shoreline%20Blvd.,%20California,%20Mountain %20View,%2094043
This action, in effect, enables Operator to perform a mashup of Upcoming.yahoo.com and Google Maps—and more generally, any web site that has adr
microformat data with Google Maps. Note also how Operator enables the user to invoke this action in the context of web browsing. Firefox with Operator joins a web site with an adr
microformat to Google Maps—and not a third-party web application.
Operator allows you to add other actions. Later in the chapter, I will show you how to add other user scripts to Operator and to write a basic user script to geocode addresses.
The hCard
data format is meant to represent a person or organization, specifically contact information for the entity. The (X)HTML source for the embedded hCard
microformat is as follows:
<div class="venue location vcard"> <span class="fn org"> <a href="/venue/259/">Computer History Museum</a> </span> <br /> <div class="address adr"> <span class="street-address">1401 N Shoreline Blvd.</span><br /> <span class="locality">Mountain View</span>, <span class="region">California</span> <span class="postal-code">94043</span> </div> <span class="geo" style="display: none"> <span class="latitude">37.4149</span>, <span class="longitude">-122.078</span> </span> </div>
You might be wondering why you will see vcard
(instead of hcard
) as a class attribute. The reason is that hCard
is derived from the vCard standard. You can compare the vCard data that Operator creates for this page to the (X)HTML source to see the similarities:
BEGIN:VCARD PRODID:-//kaply.com//Operator 0.8//EN SOURCE:http://upcoming.yahoo.com/event/144855 NAME:Mashup Camp IV at Computer History Museum (Wednesday, July 18, 2007) - Upcoming VERSION:3.0 N:;;;; ORG;CHARSET=UTF-8:Computer History Museum FN;CHARSET=UTF-8:Computer History Museum UID: ADR;CHARSET=UTF-8:;;1401 N Shoreline Blvd.;Mountain View;California;94043; GEO:37.4149;-122.078 END:VCARD
Among the default actions in Operator for hCard
is Add to Yahoo! Contacts, which, when invoked for this page, loads the following URL into the browser:
http://address.yahoo.com/?fn=Computer%20History%20Museum&co=Computer%20History%20Mus eum&ha1=1401%20N%20Shoreline%20Blvd.&hc=Mountain%20View&hs=California&hz=94043&A=C
The hCalendar
microformat represents events and is roughly speaking the iCalendar format transformed into a microformat. (See Chapter 15 for a discussion of iCalendar.) The (X)HTML source for the hCalendar
microformat is a large fragment that I will not quote here. To find it, you can use Operator or look at the source and find a <div>
element that begins with this:
<div id="calendarContainer" class="vcalendar"> <!-- Begin vCalendar -->
and ends lines later with this:
</div> <!-- End vCalendar -->
The pieces of (X)HTML in between contain event data, such as this:
<abbr class="dtstart" title="20070718T130000">Wednesday, July 18, 2007 </abbr>
and the following:
<abbr class="dtend" title="20070719T140000">
As in the case of hCard
, you might wonder why the hCalendar
format would use class="vcalendar"
and not class="hcalendar"
. vCalendar was the precursor to iCalendar, a fact that is reflected in the iCalendar standard (which if you look at the iCalendar for the Upcoming.yahoo.com event listed earlier), you have the following structure:
BEGIN:VCALENDAR [...] DTSTART:20070718T130000 DTEND:20070718T140000 [...] END:VCALENDAR
Among the default actions associated with hCalendar
are ones to send the event data to Google Calendar, Yahoo! Calendar, and 30boxes.com. Compare how you moved event data with APIs in Chapter 15 with this approach of extracting microformat data and sending that data to other services via an HTTP GET
request.
The geo
data format represents a geospatial location, specifically a latitude and longitude. The (X)HTML source for the geo
instance is as follows:
<span class="geo" style="display: none"> <span class="latitude">37.4149</span>, <span class="longitude">-122.078</span> </span>
With Operator, you can map this location to Google Maps and Yahoo! Maps, or you can export it as KML.
Upcoming.yahoo.com supports the tagging of individual events. For instance, among the tags for the example event is mashup
. You can find this tag marked up using the tag
microformat in the (X)HTML source. For example:
<a href="/tag/mashup/" rel="tag" class="category">mashup</a>
You’ll see from the following discussion that the combination of rel=tag
in an <a>
element is indicative of a tag microformat and that the last path component of the URL (that is, mashup
) is the text of the tag. By default, there are actions in Operator to look this tag up in such web sites as del.icio.us, Flickr, Upcoming.yahoo.com, and YouTube.
[305] See the blog entry announcing9b (http://www.kaply.com/weblog/2007/12/03/?operator-09-beta-available/). You can download the latest develop version of Operator from http://www.kaply.com/operator/operator.xpi.
[306] The format called tag in Operator is known as rel-tag on http://microformats.org.