Scraping Feeds Using GUI Tools

Feeds are available for many applications—but by no means for all applications. Because feeds are so useful, some services have arisen to generate feeds out of unstructured web sites. The goal of these services is to enable you to construct feeds more easily than you could screen-­scrape the pages yourself—which, as I discuss in Chapter 2, is an option absent of APIs and feeds. Let’s briefly consider one usage scenario to which we will apply two services. (I return to this topic of feed-­scraping in Chapter 11.)

As I mention elsewhere in this book, perhaps the single most useful site on the Web for tracking web APIs is Programmableweb.com. Currently, it does not have an API and does not have a feed to represent all the APIs tracked by the site, but there is a feed for the latest changes in the list of APIs. The scenario I explore here is creating an RSS or Atom feed out of the list of APIs here:

http://programmableweb.com/apis/directory

Here I apply two services to this problem. The first is a specialized feed-­creation web site:

http://www.feedity.com/

You can use Feedity to generate an RSS feed:

http://feedity.com/?http://programmableweb.com/apis/directory%40%40%40CAT%40%40%406

The feed is a perfectly fine feed except for the ads embedded in the feed. You need to use Pro (for-fee) level to get rid of the ads.

I used Openkapow.com’s RoboMaker as a second approach to generate a feed. RoboMaker is a desktop visual tool to create bots hosted on Openkapow.com to generate feeds and APIs for web sites. In Chapter 11, I analyze RoboMaker and other tools that simplify mashup making. Here, I simply point out the end product of the Openkapow.com bot that converts the list of APIs into an RSS 2.0 feed:

http://service.openkapow.com/rdhyee/programmablewebapis.rss

There is a small image for Openkapow.com in the feed but no advertisements buried in the items themselves.

As you will see in the next section, being able to generate feeds for sites that don’t have the feeds you want enables you to use the many tools that accept feeds as input.