Chapter 4. Working with Feeds, RSS, and Atom

Chapter 4. Working with Feeds, RSS, and Atom
Prev	Part I. Remixing Information Without Programming	Next

A fundamental enabling technology for mashups is syndication feeds, especially those packaged in XML. Feeds are documents used to transfer frequently updated digital content to users. This chapter introduces feeds, focusing on the specific examples of RSS and Atom. RSS and Atom are arguably the most widely used XML formats in the world. Indeed, there’s a good chance that any given web site provides some RSS or Atom feed—even if there is no XML-based API for the web site. Although RSS and Atom are the dominant feed format, other formats are also used to create feeds: JSON, PHP serialization, and CSV. I will also cover those formats in this chapter.

So, why do feeds matter? Feeds give you structured information from applications that is easy to parse and reuse. Not only are feeds readily available, but there are many applications that use those feeds—all requiring no or very little programming effort from you. Indeed, there is an entire ecology of web feeds (the data formats, applications, producers, and consumers) that provides great potential for the remix and mashup of information—some of which is starting to be realized today.

This chapter covers the following:

What feeds are and how they are used
The semantics and syntax of feeds, with a focus on RSS 2.0, RSS 1.0, and Atom 1.0
The extension mechanism of RSS 2.0 and Atom 1.0
How to get feeds from Flickr and other feed-producing applications and web sites
Feed formats other than RSS and Atom in the context of Flickr feeds
How feed autodiscovery can be used to find feeds
News aggregators for reading feeds and tools for validating and scraping feeds
How to remix and mashup feeds with Feedburner and Yahoo! Pipes

	Note
	In this chapter, I assume you have an understanding of the basics of XML, including XML namespaces and XML schemas. A decent tutorial on XML is available at `http://www.w3schools.com/xml/`. If you are new to the world of XML, working with RSS and Atom is an excellent way to get started with the XML family of technology.

What Are Feeds, and Why Are They Important?

Feeds are documents used to transfer frequently updated digital content to users. This content ranges from news items, weblog entries, installments of podcasts, and virtually any content that can be parceled out in discrete units. In keeping with this functionality, there is some commonly used terminology associated with feeds:

You syndicate, or publish, content by producing a feed to distribute it.
You subscribe to a feed by reading it and using it.
You aggregate feeds by combining feeds from multiple sources.

Although feeds come in many data formats, I focus in the following sections on three formats that you are likely to see in current web sites: RSS 2.0, Atom 1.0, and RSS 1.0. (Later in the chapter, I will mention other feed formats.) The formats have fundamental conceptual and structural similarities but also are different in fundamental ways. In addition, they have a complicated, interdependent, and contested history—which I do not untangle here.

The examples of the three feed formats are adapted from the RSS 2.0 feed of new books from Apress (http://www.apress.com/rss/whatsnew.xml). They are meant to be (as much as possible) the same data packaged in different formats. They are minimalist, though not the absolute minimal, example to illustrate the core of each format. For instance, the description elements have embedded HTML. Also, I show two items to illustrate that channels (feeds) can contain more than one item (entries). I discuss extensions to RSS and Atom later in the chapter.