Chapter 4. Working with Feeds, RSS, and Atom

Table of Contents

What Are Feeds, and Why Are They Important?
RSS 2.0
RSS 1.0
Atom 1.0
Extensions to RSS 2.0 and Atom 1.0
Feeds from Flickr
Flickr Feed Parameters
Examining the Flickr Feeds
Exchange Formats Other Than RSS and Atom
Feeds from Other Web Sites
Finding Feeds and Feed Autodiscovery
Feeds from Weblogs
Wikipedia Feeds
Google and Yahoo! News
News Aggregators: Showing Flickr Feeds Elsewhere
Validating Feeds
Scraping Feeds Using GUI Tools
Remixing Feeds with Feedburner
Remixing Feeds with Yahoo! Pipes
A Simple First Pipe with Yahoo! News
Google News and Refactoring Pipes
Wikinews and NY Times: Filtering Feeds
Pulling the Feeds Together
Summary

A fundamental enabling technology for mashups is syndication feeds, especially those packaged in XML. Feeds are documents used to transfer frequently updated digital content to users. This chapter introduces feeds, focusing on the specific examples of RSS and Atom. RSS and Atom are arguably the most widely used XML formats in the world. Indeed, there’s a good chance that any given web site provides some RSS or Atom feed—even if there is no XML-­based API for the web site. Although RSS and Atom are the dominant feed format, other formats are also used to create feeds: JSON, PHP serialization, and CSV. I will also cover those formats in this chapter.

So, why do feeds matter? Feeds give you structured information from applications that is easy to parse and reuse. Not only are feeds readily available, but there are many applications that use those feeds—all requiring no or very little programming effort from you. Indeed, there is an entire ecology of web feeds (the data formats, applications, producers, and consumers) that provides great potential for the remix and mashup of information—some of which is starting to be realized today.

This chapter covers the following:

[Note]Note

In this chapter, I assume you have an understanding of the basics of XML, including XML namespaces and XML schemas. A decent tutorial on XML is available at http://www.w3schools.com/xml/. If you are new to the world of XML, working with RSS and Atom is an excellent way to get started with the XML family of technology.

What Are Feeds, and Why Are They Important?

Feeds are documents used to transfer frequently updated digital content to users. This content ranges from news items, weblog entries, installments of podcasts, and virtually any content that can be parceled out in discrete units. In keeping with this functionality, there is some commonly used terminology associated with feeds:

  • You syndicate, or publish, content by producing a feed to distribute it.

  • You subscribe to a feed by reading it and using it.

  • You aggregate feeds by combining feeds from multiple sources.

Although feeds come in many data formats, I focus in the following sections on three formats that you are likely to see in current web sites: RSS 2.0, Atom 1.0, and RSS 1.0. (Later in the chapter, I will mention other feed formats.) The formats have fundamental conceptual and structural similarities but also are different in fundamental ways. In addition, they have a complicated, interdependent, and contested history—which I do not untangle here.

The examples of the three feed formats are adapted from the RSS 2.0 feed of new books from Apress (http://www.apress.com/rss/whatsnew.xml). They are meant to be (as much as possible) the same data packaged in different formats. They are minimalist, though not the absolute minimal, example to illustrate the core of each format. For instance, the description elements have embedded HTML. Also, I show two items to illustrate that channels (feeds) can contain more than one item (entries). I discuss extensions to RSS and Atom later in the chapter.