Feeds from Other Web Sites

Feeds from Other Web Sites
Prev	Chapter 4. Working with Feeds, RSS, and Atom	Next

Feeds are extremely helpful in creating mashups because feeds are packaged in formats designed to be accurately and automatically parsed by software. Not only do they not require programming to use—they are widely available, much more so than web APIs.

Nonetheless, feeds are still sometimes difficult to find. I first revisit the question of how to find feeds and the topic of autodiscovery. I then provide examples of feeds that are available from some specific web sites: a selection of blogs, Wikipedia, Google, and Yahoo! News. You will see how web sites other than Flickr use feeds. Moreover, I have focused in my examples on news-oriented web sites because I draw upon such sites in the feed mashups I create with Yahoo! Pipes later in the chapter.

Finding Feeds and Feed Autodiscovery

In the context of Flickr, I mention two ways of finding feeds that are applicable to other web sites:

Looking in the user interface for features such as the common orange icon or the words feed, RSS, subscribe, and so on
Finding documentation for a web site’s feeds

Let’s explore some other approaches to finding feeds. There are specialized feed directories and search engines such as the following, which also has an API (in case you find it useful):

http://www.syndic8.com/

Some of this feed search functionality has been incorporated into feed aggregators (which I describe more in a moment). For instance, you can browse and search for feeds from within Google Reader. This search functionality is also available from the Google AJAX Feed API.^[75]^[76]

It seems sensible that if you know the URL of a web page, you should be able to easily figure out the URL for any feeds that are associated with it. Indeed, a mechanism called RSS autodiscovery (or more generally, feed autodiscovery) has become a de facto standard in associating web pages with feeds. To connect a web page to a feed, you add <link> elements to the <head> element, making appropriate use of the rel, href, and type (and optionally title) attributes of <link>:

rel is set to the value alternate.
href is the URL of the feed.
type is set to the MIME type of the feed (either application/rss+xml or application/atom+xml).
title is optionally set to be a title of the feed.

For example, in the following <head> element:

http://news.yahoo.com

you find the following <link>, which points to a corresponding RSS feed at http://rss.news.yahoo.com/rss/topstories:

            <link rel="alternate" type="application/rss+xml" title="Yahoo! News - Top Stories" 
            href="http://rss.news.yahoo.com/rss/topstories" />

Many of the modern browsers support feed autodiscovery. If you use any of those browsers to go to a web page with a link to its feeds, you’ll see an icon that leads to those feeds.

Autodiscovery is similarly useful for creators of mashups. For example, if your program is fed the URL of a web page, you could look for the presence of associated feeds that might give you the data you need by using feed autodiscovery.

official standardization of feed autodiscovery?

Even though feed autodiscovery has been widely implemented, there is currently no de jure standard for this practice. Autodiscovery started as a collaboration carried out through weblogs (such as http://diveintomark.org/archives/2002/06/02/important_change_to_the_link_tag), progressed to being discussed as an IETF draft (whose last expired version was http://www.ietf.org/internet-drafts/draft-snell-atompub-autodiscovery-00.txt), and now is being considered in the context of standardization as part of HTML 5 (http://www.whatwg.org/specs/web-apps/current-work/#alternate).

In the meantime, some of the current practice around feed autodiscovery is documented in places such as the wiki at http://www.feedautodiscovery.org/doku.php.

Feeds from Weblogs

Weblogs are a major source of feeds because almost all modern weblog software produces feeds, which are often turned on by default. For example:

Blogspot weblogs have Atom feeds^[77]http://googleblog.blogspot.com/?atom.xml and http://googleblog.blogspot.com/feeds/posts/default).
WordPress blogs^[78]http://blog.mashupguide.net/feed/ and http://blog.mashupguide.net/feed/atom/).
TypePad blogs support feeds.^[79]

Wikipedia Feeds

Let’s look at what Wikipedia has in the way of feeds to supplement Flickr as an example and to be of use in the following case studies. Wikipedia is a great source of information about the news and publishes RSS feeds. Here’s some documentation for the feeds:

You can get a feed for the history of any regular page here:

http://en.wikipedia.org/w/index.php?title={page-name}&action=history&feed={format}

For example:

http://en.wikipedia.org/w/index.php?title=Hurricane_Katrina&action=history&feed=atom

http://en.wikipedia.org/w/index.php?title=Mashup_%28web_application_hybrid%29&action=history&feed=atom

Two of Wikipedia’s special pages also have feeds. The first is of all recent changes to Wikipedia (which tends to have way too much data because Wikipedia is extremely active):

http://en.wikipedia.org/wiki/Special:Recentchanges?feed={format}

and the other lets you track the creation of new pages:

http://en.wikipedia.org/wiki/?Special:Newpagesfeed={format}

If you want to track news using Wikipedia, you might want to use Wikinews (http://en.wikinews.org/wiki/Main_Page ), which has an RSS feed:

http://feeds.feedburner.com/WikinewsLatestNews

Finally, you can get at your Wikipedia watch list (when logged in) here:

http://en.wikipedia.org/w/api.php?action=feedwatchlist&feedformat={format}

where format is rss or atom.

Google and Yahoo! News

The feeds for Google News are documented here:

http://news.google.com/intl/en_us/news_feed_terms.html

You can access a variety of U.S.-oriented feeds here:

http://news.google.com/news?ned=us&topic={topic}&output={format}

where output is rss or atom and where topic is one of the values listed in Table 4-3.

Table 4.3. Possible Values for topic in Google News Feeds
Topic	Coverage
h	Top news
w	World
n	United States
b	Business
t	Science/technology
m	Health
s	Sports
e	Entertainment

For example, you can get the top news in RSS here:

http://news.google.com/news?ned=us&topic=h&output=rss

You can also get international news here:

http://news.google.com/news?ned={region}&topic=n&output={format}

where region is one of the values listed in Table 4-4.

Table 4.4. Table 4-4.Possible Values for region in Google News Feeds
Region	Country
au	Australia
ca	Canada
in	India
ie	Ireland
nz	New Zealand
en_za	South Africa
uk	United Kingdom

In addition to feeds for general topics, you can generate a feed for a specific search term in Google News (an extremely useful feature you will use when constructing targeted feeds later in the chapter):

http://news.google.com/news?q={query}&output={output}

For example, to follow news on mashups, use this:

http://news.google.com/news?q=mashup&output=rss

Yahoo! News has some similarities to Google News. In addition to getting feeds by large categories, listed here:

http://news.yahoo.com/rss

you can also get feeds by keywords via http://news.search.yahoo.com/news/rss?p={search-term}. For example:

http://news.search.yahoo.com/news/rss?p=Hurricane+Katrina

^[75]http://code.google.com/apis/ajaxfeeds/

^[76]http://www.xml.com/pub/a/2004/02/11/googlexml.html

^[77]http://help.blogger.com/bin/topic.py?topic=8927

^[78]http://codex.wordpress.org/WordPress_Feeds

^[79]http://support.typepad.com/cgi-bin/typepad.cfg/php/enduser/std_adp.php?p_faqid=86

Prev	Up	Next
Feeds from Flickr	Home	News Aggregators: Showing Flickr Feeds Elsewhere