Chapter 2. Uncovering the Mashup Potential of Web Sites

Table of Contents

What Makes Web Sites and Applications Mashable
Ascertaining the Fundamental Entities of the Web Site
Public APIs and Existing Mashups
Use of Ajax
Embedded Scriptability
Browser Plug-­Ins
Getting Data In and Out of the Web Site
The Community of Users and Developers
Mobile and Alternative Interfaces and the Skinnability of the Web Site
Documentation
Is the Web Site Run on Open Source?
Intellectual Property, Reusability, and Creative Commons
Tagging, Feeds, and Weblogging
URL Languages of Web Sites
Some Mashups Briefly Revisited
Flickr: The Fundamentally Mashup-­Friendly Site
Resources in Flickr
Users and Photos
Data Associated with an Individual Photo
Tags
User’s Archive: Browsing Photos by Date
Sets
Collections
Favorites
A User’s Popular Photos
Contacts
Groups
Account Management
Browsing Through Flickr
Search
Geotagged Photos in Flickr
The Flickr Organizer
Recent Activities
Mailing Interfaces
Interfacing to Weblogs
Syndication Feeds: RSS and Atom
Mobile Access
Third-Party Flickr Apps
Creative Commons Licensing
Cameras
The Mashup-­by-URL-Templating-and-Embedding Pattern
Google Maps
URL Language of Google Maps
Viewing KML Files in Google Maps
Connecting Yahoo! Pipes and Google Maps
Other Simple Applications of the Google Maps URL Language
Amazon
Amazon Items
Lists
Tags
Subject Headings
del.icio.us
Screen-Scraping and Bots
Summary

In the previous chapter, you studied several examples of mashups in depth. With the goal of learning how to create your own mashups, you’ll turn now to the raw ingredients of mashups—individual web sites and web applications. Although the focus of this book is on public web application programming interfaces (APIs), you’ll first study the human user interface (UI) of web sites for their mashup potential.

Why not jump straight to using APIs if that’s what you want to use to create mashups? After all, wouldn’t the APIs be the most useful place to begin with since they are especially designed for enabling access to the web site’s data and services? What you learn from studying a web site’s user interface is useful—even essential—to using APIs effectively. When you exercise a web site’s public API, you usually need to understand the overall logic of the web site. For instance, some mashups, such as those created with Greasemonkey (like the Google Maps in Flickr [GMiF] script from Chapter 1), extend the application directly by hooking into and blending with the existing user interface. To create something like GMiF, you would need detailed knowledge of the application you plan to mash up. One of the best ways to uncover potential hooks of a web site is to use the web site as an end user, armed with a developer’s sensibility.

Creating mashups doesn’t always require much programming. It can be as simple as linking to the right part of an application, accessing the appropriate feed, or connecting the web site to a weblog. In this chapter, I will point out how features created for end users can enable you to create mashups with minimal or no programming.

Flickr is the central example in this chapter, one that I analyze extensively. I follow with Google Maps as an important complementary example. Flickr and Google Maps are among the most mashed up APIs on the Web. I also discuss del.icio.us, a pioneering social bookmarking site, and Amazon, which is an example of an e-­commerce platform. In this chapter, I have selected highly remixable applications—as opposed to web sites that are difficult to recombine—as a way to ease into your study of creating mashups.

In this book, I focus mostly on how to use public APIs but briefly mention screen-­scraping. APIs often don’t do everything you might want from them. Although you can do a lot with public APIs, screen-­scraping provides an important alternative or complementary approach. Nonetheless, you should use the API as the first resort. You can screen-­scrape if you need to, but always use a web site’s computational and network resources respectfully, being mindful of the legal ramifications of what you are doing.

What Makes Web Sites and Applications Mashable

I’ll now cover the aspects of web sites and web applications that make them amenable to mashups. Some features are useful regardless of whether you are using the API or whether you are using informal mechanisms for integration. In either case, you are looking for ways to hook into an application. The following sections will help you to analyze a web site for these integration hooks.

Ascertaining the Fundamental Entities of the Web Site

The basic questions to begin with when analyzing a web site are the following: What is the web site fundamentally about? What are the key entities, or resources to borrow a term from W3C parlance? How are these entities or resources associated with specific URLs/URIs? A resource is anything with a URI associated with it. A formal definition of a resource comes from “Uniform Resource Identifier (URI): Generic Syntax” (RFC 3986):[27]

This specification does not limit the scope of what might be a resource; rather, the term “resource” is used in a general sense for whatever might be identified by a URI. Familiar examples include an electronic document, an image, a source of information with a consistent purpose (e.g., “today’s weather report for Los Angeles”), a service (e.g., an HTTP-­to-SMS gateway), and a collection of other resources. A resource is not necessarily accessible via the Internet; e.g., human beings, corporations, and bound books in a library can also be resources. Likewise, abstract concepts can be resources, such as the operators and operands of a mathematical equation, the types of a relationship (e.g., “parent” or “employee”), or numeric values (e.g., zero, one, and infinity).

The question of resources and their corresponding URIs are not as abstract as they may sound. In fact, looking at resources may seem rather obvious. For example, for Flickr, which is self-­described as “almost certainly the best online photo management and sharing application in the world,” important entities are, not surprisingly, photos and users. As you will see later in the chapter, these entities are also resources; you can identify specific photos and users in the URLs produced by Flickr. For example, this URL:

http://www.flickr.com/photos/raymondyee/508341822/

is for photo 508341822, which belongs to user raymondyee. A Flickr photo is addressable via a URL; that is, a URL can lead you right to the photo in question. As experienced users of the Web, we all know the useful things we can do when there are specific URLs. You can bookmark a link, e-mail it, and use it as a reference in a web page. You don’t have to tell someone to go to Flickr and type in the photo number to get to the photo.

As you will see later in this chapter, granular URLs also enable mashups. A major part of this chapter is devoted to studying web sites by analyzing their end-­user functionality and how it can be seen through its URI structure (URL language). In the following sections, I discuss in greater detail notions of addressability, granularity, transparency, and persistence in URLs. I will present a detailed listing of entities and how you can refer to them in URIs for Flickr, as well as a brief analysis of Google Maps, Amazon, and del.icio.us for comparison.

Public APIs and Existing Mashups

Is there a public API for the web site? A web site’s public API is specifically designed as the official channel of programmatic access to the data and services of the web site. It essentially lets you access and program the web site almost like a local object or database. For a slightly more formal definition of an API, consider the one by John Musser from Programmableweb.com: “a set of functions that one computer program makes available to other programs so they can talk to it directly.”[28]

If there is a public API for a web site, how have people used the API? Looking for what others have done with the API helps you get right into the application without wading through any documentation.

What’s the range of the third-­party wrappers available for the API? How many are officially supported by the web site owners? Which ones were developed by the community?

Are there many people working with the API, or is there little evidence that it is being used at all? Have any mashups using the web site been developed? How sophisticated are the mashups? Are they straight-­up remixes of the data or presentations of the data in a new context? Do you see some emergent and unexpected property? Surprising mashups often reveal a capacity in the formal API or some integration point that might not be obvious from a quick glance at the documentation.

The more interesting mashups that exist for an application, the more likely it is that the application is amenable to mashups. Look for mashups that contain functionality similar to what you want to include.

Chapter 6 and Chapter 7 present an overview of how to use public web site APIs, starting with a study of the Flickr APIs and then moving on to a survey of other APIs.

Use of Ajax

Does the web site use Ajax and allied JavaScript techniques to integrate data dynamically into the user interface? As you will learn in Chapter 8, the presence of Ajax is an indicator that there is likely an API at work—either a formalized public API or a programmatic structure, though not intended for public interfacing, that might possibly be used for mashup making. (Recall from Chapter 1 how Housingmaps.com placed markers on the first generations of Google Maps by tapping into the programming logic before any public API for Google Maps was released.)

Embedded Scriptability

Can people embed plug-­ins, add-­ons, or extensions (as opposed to writing external applications) to extend the web site directly? Here are examples of extension frameworks for specific web sites:

For web applications, these are some examples:

For desktop applications/OS environments, take a look at these examples:

If you have the required permissions, you can install or write extensions that incorporate other services into the applications.

Browser Plug-­Ins

Are there Firefox add-­ons (https://addons.mozilla.org/en-US/firefox/) that supplement or enhance the user interface to the web site? For example:

If you see a form of communication between the add-­on and the application, you know there is some form of public or private API. Other browsers have extension mechanisms,[29]

Getting Data In and Out of the Web Site

How can you import data into the application? With what protocols? What data or file formats are accepted?

How can you export data from the application? What formats are provided? What protocols are supported?

It’s much easier to make mashups out of widely deployed data formats and protocols (whether they are de jure or de facto standards) than with obscure data formats and protocols.

Can you embed data from the web site elsewhere? An example of such embedding is a JavaScript badge (such as http://www.platial.com/mapkit/faq). What options do you have for customizing the badge? Super-­flexible badges can be used themselves to access data for mashups and hint at the existence of a feature-­rich API.

The Community of Users and Developers

What communities of users and developers have grown around the web site? Where can you go to participate in that community and ask questions? What are members of the community discussing? What are some of the limitations of the application that they want to be overcome? What clever solutions or workarounds—hacks—are being popularized in that community, not only among developers but also among nonprogramming power users in the community?

Again, seeing how the API gets used and discussed is a great way to get a handle on what is possible and interesting. And if you don’t see much activity around the API, realize that you are likely to be on your own if you decide to use it.

Why do I stress looking at the community around an application and its API? A vibrant and active community makes a lot of mashup work practical. When making mashups, some things are theoretically possible to do—if you had the time, energy, and resources—but are practically impossible for you as an individual to pull off. A community of developers means there are other people to work with, lots of examples of what other people have done, and often code libraries that you can build upon.

Mobile and Alternative Interfaces and the Skinnability of the Web Site

How many versions of the user interface are there for the web site? Is there a mobile interface? A mobile version is often easier to decipher than the main site and highlights what the web site’s creators believe to be some core logic of the web site. A mobile version might be more easily integrated into a mashup for a phone; there is typically no JavaScript to worry about, and the HTML is easier to parse.

How difficult is it to change the look of the interface? That is, how “skinnable” is the web site? Easy customizability of the interface for end users is an indicator that the application developers have likely separated the application logic from presentation logic. And if skinnability is available to end users, that functionality might also be programmable. For example, WordPress themes typically allow the owner of a WordPress site to change the set of global styles of the site.

Documentation

Good documentation of the features, the API, the data formats, and any other aspect of the web site makes it much easier to understand and recombine its data and functionality. Are the input and output data documented? If so, are there schemas, in other words, ways to validate data? Are the formats properly versioned?

Documentation reduces the amount of guesswork involved. Moreover, it brings certainty to whether a function you uncover through reverse engineering is an official feature or an undocumented hack that has no guarantee of working for any length of time.

Is the Web Site Run on Open Source?

If the web site is powered by free or open source software, you have the option of studying the source directly should reverse engineering—or reading the relevant documentation—not give you the answers you need.

Intellectual Property, Reusability, and Creative Commons

Does the web site allow users to explicitly set the licensing of content and data, under Creative Commons, for instance? Does the web site enable users to search and browse content by license? Explicit licensing of digital content clears away important barriers to creating mashups with that content. A detailed discussion of the Creative Commons is beyond the scope of this book. To learn more, consult the following:

http://creativecommons.org

Tagging, Feeds, and Weblogging

Here I present a series of questions that will be explored at length in the chapters that immediately follow.

Does the web site use tagging? That is, can users tag items and search for items by tags in the web site? Chapter 3 covers tagging and folksonomy in detail and shows how tags provide mashups with hooks within a web site and among web sites.

Are there RSS and Atom feeds available from the site? Do they give you fine-­grained access to the web site? (That is, can you get feeds for a specific search term or for a specific part of a web site?) In the absence of a formal API, syndication feeds become a source of structured, easy-­to-parse data. See Chapter 4 for detailed coverage of RSS and Atom feeds.

Does the web site allow you to send content to a weblog or wiki? Studying how the web site is connected to a weblog in this manner is an excellent way to get some practice with configuring APIs without programming. See Chapter 5 for more on blogging and wiki APIs.