Chapter 7. Exploring Other Web APIs

Chapter 7. Exploring Other Web APIs
Prev	Part II. Remixing a Single Web Application Using Its API	Next

Table of Contents

XML-RPC

What’s Happening on the Wire?
Using Wireshark and curl to Analyze and Formulate HTTP Messages
Parsing XML-RPC Traffic

SOAP

The Dream: Plug-and-Go Functionality Through WSDL and SOAP
geocoder.us
Amazon ECS
The Flickr API via SOAP

Learning About Specific Web APIs

Programmableweb.com
YouTube
GData and the Blogger API
Using the Blogger API As a Uniform Interface Based on HTTP Methods

Summary

In Chapter 6, you examined the Flickr API in great detail, so I’ll turn now to other web APIs. Studying the Flickr API in depth is obviously useful if you plan to use it in your mashups, but I argue here that it’s useful in your study of other APIs because you can draw from your understanding of the Flickr API as a point of comparison. (I’ll cover the subject of HTTP web APIs bound to a JavaScript context in the next chapter. You’ll take what you learn in Chapter 6 and this chapter and study the specific context of working within the modern web browser using JavaScript.)

How do you generalize from what you know about the Flickr API to other web APIs? I will use three major axes/categories for organizing my presentation of web APIs. (I’m presenting some heuristics for thinking about the subject rather than a watertight formula. This scheme won’t magically enable you to instantly understand all the various APIs out there.) The categories I use are as follows:

The protocols used by the API. Some questions that I’ll discuss include the following: Is the API available with a REST interface? Does it use SOAP or XML-RPC?
The popularity or influence of the API. It’s helpful to understand some of the more popular APIs because of their influence on the field in general and also because popularity is an indicator of some utility. We’ll look at how you might figure out what’s popular.
The subject matter of the APIs. Since APIs are often tied to specific subject matter, you’ll naturally need to understand the basics of the subject to make sense of the APIs. What are some of those subject areas?

It doesn’t take being too long in the field of web services to hear about REST vs. SOAP as a great divide—and hence the impetus for classifying web services by the protocols used. You already saw the terms REST and SOAP (as well as XML-RPC) in Chapter 6 to describe the request and response formats available to developers of the Flickr API. I focused on the Flickr REST formats because they are not only the easiest ones to work with but also they are the ones that are most helpful for learning other APIs.

In this chapter, I’ll cover what XML-RPC and SOAP are about. Understanding just Flickr’s REST request/response structure can get you far—but there are web APIs that have only XML-RPC or SOAP interfaces. So, I’ll start by discussing XML-RPC and SOAP and show you the basics of how to use those two protocols. Also, I’ll lay out tips for dealing with the practical complexities that sometimes arise in consuming SOAP services.

Note

The term REST (an acronym for Representational State Transfer) was coined by Roy Fielding to describe a set of architectural principles for networks. In Fielding’s usage, REST is not specifically tied to HTTP or the Web. At the same time, a popular usage has arisen for REST to refer to exchanging messages over HTTP without using such protocols as SOAP and XML-RPC, which introduce an additional envelope around these messages. These two different usages of the term REST have caused confusion since it is possible to use HTTP to exchange messages without additional envelopes in a way that nonetheless does not conform to REST principles. If a creator of a service associates the service with the term REST (such as the Flickr REST interface), I will also refer to it as REST in this chapter.

Once you have a good understanding of the protocols and architectural issues behind HTTP web services, you’re in a good position to consume any web API you come across—at least on a technical level. You still have to understand what a service is about and which services you might want to use. I will cover how to use Programmableweb.com as a great resource to learn about APIs in general. Programmableweb.com helps you understand which are the popular APIs as well as how APIs can be categorized by subject matter. I conclude the chapter with a study of two APIs: the API for YouTube as a simple REST interface and the Blogger API as a specific case of an entire class of APIs that share a uniform interface based on a strict usage of the HTTP methods.

XML-RPC

Although Flickr provides the option of using the XML-RPC and SOAP request and response formats in addition to REST, I wrote all my examples using the Flickr REST request format in Chapter 6. I’ll show you how to use the XML-RPC protocol in this section and cover SOAP in the following section.

	Tip
	Before taking on this section, it might be helpful to review Chapter 6’s “A Refresher on HTTP” section and remind yourself of the structure of an HTTP request and response and the variety of HTTP request methods.

XML-RPC is defined at http://www.xmlrpc.com/ as “remote procedure calling using HTTP as the transport and XML as the encoding.” XML-RPC specifies how to form remote procedure calls in terms of requests and responses, each of which has parameters composed of some basic data types. There are XML-RPC libraries written in many languages, including PHP and Python.

A central point of having an XML-RPC interface for a web API is akin to that of an API kit—getting an interface that is a closer fit to the native structures and found in the programming language you are using. Let’s consider a specific example to make this point.

Recall from Chapter 6 how to use the Flickr REST interface to search for public photos. You do an HTTP GET request on the following URL:

http://api.flickr.com/services/rest/?method=flickr.test.echo&api_key={api-key}

and parse the resulting XML (using, say, the libcurl and simpleXML libraries in PHP). Let’s see how you do the same query using XML-RPC in Python and PHP for comparison. In Python, you can use xmlrpclib, which is part of the standard Python distribution and is documented at

http://docs.python.org/lib/module-xmlrpclib.html

Here’s a program to illustrate how to make a call to Flickr: one to flickr.search.photos. Note how parameters are passed in and how you can use the ElementTree library to parse the output. To use the xmlrpclib to make this call, you need to know that the XML-RPC server endpoint URL is as follows:

http://api.flickr.com/services/xmlrpc/

and you need to name your parameters and stick them into a dictionary. When I ran the following:

API_KEY = "[API-KEY]"

from xmlrpclib import ServerProxy, Error, Fault
server = ServerProxy("http://api.flickr.com/services/xmlrpc/")

try:
    from xml.etree import ElementTree as et
except:
    from elementtree import ElementTree as et

# call flickr.search.photos

args = {'api_key': API_KEY, 'tags':'flower', 'per_page':3}
try:
    rsp = server.flickr.photos.search(args)
except Fault, f:
    print "Error code %s: %s" % (f.faultCode, f.faultString)

# show a bit of XML parsing using elementtree
# useful examples:  http://www.amk.ca/talks/2006-02-07/
# context page for photo: http://www.flickr.com/photos/{user-id}/{photo-id}

# fixes parsing errors when accented characters are present
rsp = rsp.encode('utf-8')
print rsp
tree = et.XML(rsp)
print "total number of photos: %s" %(tree.get('total'))
for p in tree.getiterator('photo'):
    print "%s: http://www.flickr.com/photos/%s/%s" % (p.get("title"),
p.get("owner"), p.get("id"))

I got this:

<photos page="1" pages="485798" perpage="3" total="1457392">
  <photo id="1236197537" owner="7823684@N06" secret="f58310acf3"
         server="1178" farm="2" title="Rainbow over flower" ispublic="1"
         isfriend="0" isfamily="0" />
  <photo id="1236134903" owner="27238986@N00" secret="fa461fb8e3" server="1036"
         farm="2" title="Watercolor" ispublic="1" isfriend="0"
         isfamily="0" />
  <photo id="1237043346" owner="33121739@N00" secret="7a116ff4af" server="1066"
         farm="2" title="Flowers" ispublic="1" isfriend="0" isfamily="0" />
</photos>

total number of photos: 1457392
Rainbow over flower: http://www.flickr.com/photos/7823684@N06/1236197537
Watercolor: http://www.flickr.com/photos/27238986@N00/1236134903
Flowers: http://www.flickr.com/photos/33121739@N00/1237043346

Note how the xmlrpclib library takes care of packaging the response and sending you back the XML payload (which doesn’t have the <rsp> root node that is in the Flickr REST response). However, you still have to parse the XML payload. Whether using XML-RPC or REST is more convenient, you can judge for yourself.

Let’s take a look at how some PHP code looks. There are two major PHP libraries for XML-RPC:

http://phpxmlrpc.sourceforge.net/
http://pear.php.net/package/XML_RPC/

Here I show how to use the PEAR::XML_RPC package. You can install it using PEAR:

pear install XML_RPC

The following program shows how to use PEAR::XML-RPC to do a number of things:

You can retrieve the current time by making a call that requires no parameters ?(currentTime.getCurrentTime) from http://time.xmlrpc.com.
In search_example(), you can make a specific call to flickr.photos.search.
The class flickr_client shows how to generalize search_example() to handle more of the Flickr methods.

Here’s the program:

<?php

// flickr_xmlrpc.php
// This code demonstrates how to use ?XML-?RPC using the PEAR::XML-RPC library.
// gettime() is the simple example that involves
// calling a timeserver without passing in any parameters.
// search_example() shows a specific case of how to pass in some parameters
// for flickr.photos.search
// the flickr_client class generalizes search_example() to handle Flickr methods
// in general.

require_once('XML/RPC.php');
$API_KEY ='[API-KEY]';

function process_xmlrpc_resp($resp) {
  if (!$resp->faultCode()) {
      $val = $resp->value()->scalarval();
      return $val;
  } else {
    $errormsg = 'Fault Code: ' . $resp->faultCode() . "\n" . 'Fault Reason: ' .
      $resp->faultString() . "\n";
    throw new Exception ($errormsg);
  }
}

class flickr_client {

  protected $api_key;
  protected $server;

  public function __construct($api_key, $debug) {
    $this->api_key = $api_key;
    $this->server =
      new XML_RPC_Client('/services/xmlrpc','http://api.flickr.com',80);
    $this->server->setDebug($debug);
  }

  public function call($method,$params) {

    # add the api_key to $params
    $params['api_key'] = $this->api_key;

    # build the struct parameter needed
    foreach ($params as $key=>$val) {
      $xrv_array[$key] = new XML_RPC_Value($val,"string");
    }
    $xmlrpc_val = new XML_RPC_Value ($xrv_array,'struct');

    $msg = new XML_RPC_Message($method, array($xmlrpc_val));
    $resp = $this->server->send($msg);

    return process_xmlrpc_resp($resp);

  } //call

} //class flickr_client

function search_example () {
  GLOBAL $API_KEY;
  $server = new XML_RPC_Client('/services/xmlrpc','http://api.flickr.com',80);
  $server->setDebug(0);

  $myStruct = new XML_RPC_Value(array(
      "api_key" => new XML_RPC_Value($API_KEY, "string"),
      "tags" => new XML_RPC_Value('flower',"string"),
      "per_page" => new XML_RPC_Value('2',"string"),
      ), "struct");

  $msg = new XML_RPC_Message('flickr.photos.search', array($myStruct));
  $resp = $server->send($msg);

  return process_xmlrpc_resp($resp);
}

function gettime() {

  # http://www.xmlrpc.com/currentTime
  $server = new XML_RPC_Client('/RPC2','http://time.xmlrpc.com',80);
  $server->setDebug(0);

  $msg = new XML_RPC_Message('currentTime.getCurrentTime');
  $resp = $server->send($msg);

  return process_xmlrpc_resp($resp);

}

print "current time: ".gettime();
print "output from search_example \n" . search_example(). "\n";

$flickr = new flickr_client($API_KEY,0);

print "output from generalized Flickr client using ?XML-?RPC\n";
print $flickr->call('flickr.photos.search',array('tags'=>'dog','per_page'=>'2'));
?>

What’s Happening on the Wire?

XML-RPC is meant to abstract away how a remote procedure call is translated into an exchange of XML documents over HTTP so that you as a user of XML-RPC don’t have to understand the underlying process. That’s the theory with XML-RPC and especially with SOAP, an expansive elaboration on XML-RPC out of which it originally evolved. In practice, with the right tools and under certain circumstances, consuming services with XML-RPC or SOAP is a very simple, trouble-free experience.

At other times, however, you’ll find yourself having to know more about the underlying protocol than you really need to know. For that reason, in the following sections I’ll show you techniques for making sense of what XML is actually being exchanged and how it’s being exchanged over HTTP. This discussion is meant as an explication of XML-RPC in its own right but also as preparation for studying the yet more complicated SOAP later in the chapter. But first, let’s look at two tools that I use to analyze XML-RPC and SOAP: Wireshark and curl.

Using Wireshark and curl to Analyze and Formulate HTTP Messages

Wireshark (http://www.wireshark.org/) is an open source network protocol analyzer that runs on Windows, OS X, and Linux. With it, you can analyze network traffic flowing through your computer, including any HTTP traffic—making it incredibly useful for seeing what’s happening when you are using web APIs (or, if you are curious, merely surfing the Web). Refer to the Wireshark site for instructions about how to install and run Wireshark for your platform.

	Tip
	With Wireshark, I found it helpful to turn off the Capture Packets in Promiscuous Mode option. Also, for studying web service traffic, I filter for only HTTP traffic—otherwise, there is too much data to view.

curl (http://curl.haxx.se/) is another highly useful command-line tool for working with HTTP—among many other things:

curl is a command line tool for transferring files with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TFTP, TELNET, DICT, FILE and LDAP. curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, user+password authentication (Basic, Digest, NTLM, Negotiate, kerberos . . . ), file transfer resume, proxy tunneling, and a busload of other useful tricks.

Go to http://curl.haxx.se/download.html to find a package for your platform. Be sure to look for packages that support SSL—you’ll need it when you come to some examples later this chapter. Remember in particular the following documentation:

http://curl.haxx.se/docs/manpage.html is the man page for curl.
http://curl.haxx.se/docs/httpscripting.html is the most helpful page in many ways because it gives concrete examples.

To learn these tools, I suggest using curl to issue an HTTP request and using Wireshark to analyze the resulting traffic. For instance, you can start with the following:

curl http://www.yahoo.com

to see how to retrieve the contents of a web page. To see the details about the HTTP request and response, turn on the verbose option and make explicit what was implicit (that fetching the content of http://www.yahoo.com uses the HTTP GET method):

curl -v -X GET http://www.yahoo.com

You can get more practice studying Wireshark and the Flickr API by performing some function in the Flickr UI or in the Flickr API Explorer and seeing what HTTP traffic is exchanged. Try operations that don’t require any Flickr permissions, and then try ones that require escalating levels of permissions. You can see certainly see the Flickr API being invoked and when HTTP GET vs. HTTP POST is used by Flickr—and specifically what is being sent back and forth.

I’ll teach you more about curl in the context of the following examples.

Parsing XML-RPC Traffic

When you look at the documentation for the XML-RPC request format for Flickr ?(http://?www.flickr.com/services/api/request.xmlrpc.html) and for the response format (http://www.flickr.com/?services/api/response.xmlrpc.html), you’ll find confirmation that the transport mechanism is indeed HTTP (just as it for the REST request and response). However, the request parameters and response are wrapped in many layers of XML tags. I’ll show you how to use Wireshark and curl to confirm for yourself what’s happening when you use XML-RPC.

Here I use Wireshark to monitor what happens when I run the Python example that uses the flickr.photos.search method and then use curl to manually duplicate the same request to show how you can formulate XML-RPC requests without calling an XML-RPC library per se. Again, I’m not advocating this as a practical way of using XML-RPC but as a way of understanding what’s happening when you do use XML-RPC.

When I ran the Python program and monitored the HTTP traffic, I saw the following request (an HTTP POST to /services/xmlrpc/):

POST /services/xmlrpc/ HTTP/1.0

It had the following HTTP request headers:

Host: api.flickr.com
User-Agent: xmlrpclib.py/1.0.1 (by www.pythonware.com)
Content-Type: text/xml
Content-Length: 415

and the following request body (reformatted here for clarity):

<?xml version='1.0'?>
<methodCall>
  <methodName>flickr.photos.search</methodName>
  <params>
    <param>
      <value><struct>
        <member>
          <name>per_page</name>
          <value><int>3</int></value>
        </member>
        <member>
          <name>api_key</name>
          <value><string>[API-KEY]</string></value>
        </member>
        <member>
          <name>tags</name>
          <value><string>flower</string></value>
        </member>
      </struct></value>
    </param>
  </params>
</methodCall>

The HTTP response (edited here for clarity) was as follows:

HTTP/1.1 200 OK
Date: Sun, 26 Aug 2007 04:33:29 GMT
Server: Apache/2.0.52
[...some cookies....]
Content-Length: 1044
Connection: close
Content-Type: text/xml; charset=utf-8

<?xml version="1.0" encoding="utf-8" ?>
<methodResponse>
  <params>
    <param>
      <value>
        <string>
          &lt;photos page=&quot;1&quot; pages=&quot;485823&quot;
          perpage=&quot;3&quot; total=&quot;1457468&quot;&gt;
          &lt;photo id=&quot;1237314286&quot; owner=&quot;41336703@N00&quot;
          secret=&quot;372291c5f7&quot; server=&quot;1088&quot; farm=&quot;2&quot;
          title=&quot;250807 047&quot; ispublic=&quot;1&quot; isfriend=&quot;0&quot;
          isfamily=&quot;0&quot; /&gt;
          &lt;photo id=&quot;1236382563&quot; owner=&quot;70983346@N00&quot;
          secret=&quot;459e79fde3&quot; server=&quot;1376&quot; farm=&quot;2&quot;
          title=&quot;Darling daisy necklace&quot; ispublic=&quot;1&quot;
          isfriend=&quot;0&quot; isfamily=&quot;0&quot; /&gt;
          &lt;photo id=&quot;1237257850&quot; owner=&quot;39312862@N00&quot;
          secret=&quot;fa9d15f9c3&quot; server=&quot;1272&quot; farm=&quot;2&quot;
          title=&quot;Peperomia species&quot; ispublic=&quot;1&quot;
          isfriend=&quot;0&quot; isfamily=&quot;0&quot; /&gt;
          &lt;/photos&gt;
        </string>
      </value>
    </param>
  </params>
</methodResponse>

To make sense of the interchange, it’s useful to study the XML-RPC specification (http://www.xmlrpc.com/spec) to learn that the Flickr XML-RPC request is passing in one struct that holds all the parameters. The request uses HTTP POST. What comes back in the response is an entity-encoded XML <photos> element (the results that we wanted from the API), wrapped in a series of XML elements used in the XML-RPC protocol to encapsulate the response. This process of serializing the request and deserializing the response is what an XML-RPC library does for you.

We can take this study of XML-RPC one more step. You can use curl (or another HTTP client) to confirm that you can synthesize an XML-RPC request independently of any XML-RPC library to handle the work for you. This is not a convenient way to do things, and it defeats the purpose of using a protocol such as XML-RPC—but this technique is helpful for proving to yourself that you really understand what is really happening with a protocol.

To wit, to call flickr.photos.search using XML-RPC, you need to send an HTTP POST request to http://api.flickr.com/services/xmlrpc/ whose body is the same as what I pulled out using Wireshark. The call, formulated as an invocation of curl, is as follows:

curl -v -X  POST --data-binary "<?xml version='1.0' encoding='UTF-8'?>
  <methodCall><methodName>flickr.photos.search</methodName><params><param><value>
  <struct><member><name>per_page</name><value><int>3</int></value></member><member>
  <name>api_key</name><value><string>[API-KEY]</string></value></member><member>
  <name>tags</name><value><string>flower</string></value></member></struct></value>
  </param></params></methodCall>"  http://api.flickr.com/services/xmlrpc/

	Note
	To write `curl` invocations that work from the command line of Windows, OS X, and Linux, I rewrote the XML to use single quotes to allow me to use double quotes to wrap the XML.

You can issue this request through curl to convince yourself that you are now speaking and understanding XML-RPC responses!

An XML-RPC library is supposed to hide the details you just looked at from you. One of the major practical problems that I have run into when using XML-RPC (and SOAP) is understanding for a given language and library how exactly to formulate a request. Notice some important lines from the examples. An essentialist rendition of the Python example is as follows:

server = ServerProxy("http://api.flickr.com/services/xmlrpc/")
args = {'api_key': API_KEY, 'tags':'flower', 'per_page':3}
rsp = server.flickr.photos.search(args)
rsp = rsp.encode('utf-8')
tree = et.XML(rsp)
print "total number of photos: %s" %(tree.get('total'))

Besides the mechanics of calling the right libraries, you had to know how to pass in the URL endpoint of the XML-RPC server—which is usually not too hard—but also how to package up the parameters. Here, I had to use a Python dictionary, whose keys are the names of the Flickr parameters. I then call flickr.photos.search as a method of server and get back XML.

The PHP example can be boiled down to this:

$server = new XML_RPC_Client('/services/xmlrpc','http://api.flickr.com',80);
$myStruct = new XML_RPC_Value(array(
   "api_key" => new XML_RPC_Value($API_KEY, "string"),
   "tags" => new XML_RPC_Value('flower',"string"),
   "per_page" => new XML_RPC_Value('2',"string"),
   ), "struct");
$msg = new XML_RPC_Message('flickr.photos.search', array($myStruct));
$resp = $server->send($msg);
$val = $resp->value()->scalarval();

Again, I knew what I had to tell PHP and the PEAR::XML_RPC library, and once someone provides you with skeletal code like I did here, it’s not hard to use. However, it has been my experience with XML-RPC and especially SOAP that it takes a lot of work to come up with the incantation that works. Complexity is moved from having to process HTTP and XML directly (as you would have using the Flickr REST interface) to understanding how to express methods and their parameters in the way a given higher-level toolkit wants from you.