Google Spreadsheets API

Let’s figure out how to use the Google Spreadsheets API, focusing specifically on PHP and Python wrapper libraries. You can also directly manipulate the feed protocol:

http://code.google.com/apis/spreadsheets/developers_guide_protocol.html

There are also a number of API kits available for Google Spreadsheets. I will first demonstrate how to use the Python API kit by creating a mashup involving the Amazon.com web services. I’ll then show you a simple example of using the Zend PHP API kit to read the spreadsheets generated in the mashup.

Python API Kit

Google provides a Python GData library and sample code to access the Google spreadsheet. You can either download specific releases (from http://code.google.com/p/gdata-python-client/downloads/list) or access the SVN repository:

            svn checkout http://gdata-python-client.googlecode.com/svn/trunk/
            gdata-python-client
         

Note the dependencies on other libraries, especially ElementTree, which was not part of the standard Python libraries until version 2.5.[302]

I highly recommend reading the documentation on the Google site specific to the Python library:

http://code.google.com/apis/spreadsheets/developers_guide_python.html

Once you have the Python GData library installed, you can try some code samples, using the Python interpreter, to teach yourself how it works. First here are the obligatory imports:

import gdata.spreadsheet.service

Let’s then declare some convenience functions and variables:

            GoogleUser = "[your Google email address]"
            GooglePW = "[your password]"
         

Define the following convenience function:

            def GSheetService(user,pwd):
                gd_client = gdata.spreadsheet.service.SpreadsheetsService()
                gd_client.email = user
                gd_client.password = pwd
                gd_client.source = 'amazonWishListToGSheet.py'
                gd_client.ProgrammaticLogin()
            
                return gd_client
         

Instantiate a Google data client for your spreadsheet:

            gs = GSheetService(GoogleUser,GooglePW)
            sheets = gs.GetSpreadsheetsFeed()
         

To get a list of the spreadsheets, their titles, and their IDs, use this:

            print map(lambda e: e.title.text + " : "  + e.id.text.rsplit('/', 1)[1],sheets.entry)
         

This yields something like the following (which is based on my own spreadsheets):

            ['My Amazon WishList : o06341737111865728099.3585145106901556666', 'Udell Mini-
            Symposium May 1, 2007 : o06341737111865728099.1877210150658854761', 'weight.journal
            : o06341737111865728099.6289501454054682788', 'Plan : o10640374522570553588.
            5762564240835257179']
         

Note the key for the spreadsheet “My Amazon WishList.” The spreadsheet that the code I write here will be reading from and writing to is as follows:

o06341737111865728099.3585145106901556666

You will need to create your own Google spreadsheets to work with since you won’t be able to write to mine. Note the ID of your spreadsheet, which you will use here.

In the browser, if I’m logged in as the owner of the spreadsheet, I can access this:

http://spreadsheets.google.com/feeds/spreadsheets/private/full/{GSheetID}

For example:

            http://spreadsheets.google.com/feeds/spreadsheets/private/full/o0634173711186572809
            .3585145106901556666
         

Otherwise, I get a 404 error. Now I need to get the ID of the one worksheet in the “My Amazon Wishlist” spreadsheet. First use this:

gs.GetWorksheetsFeed(key="[GSheetID]").entry[0].id.text

For example, I use this:

            gs.GetWorksheetsFeed(key="o06341737111865728099.3585145106901556666").entry[0].
            id.text
         

to return the URL whose last segment is a worksheet ID—that is, a URL of the following form:

            http://spreadsheets.google.com/feeds/worksheets/{GSheetID}/private/full/
            {worksheetID}
         

For example:

            http://spreadsheets.google.com/feeds/worksheets/o06341737111865728099.35851451069015
            56666/private/full/od6
         

(in which case the worksheet ID is od6).

Now you can get the worksheet ID:

gs.GetWorksheetsFeed(key="[GSheetID]").entry[0].id.text.rsplit('/', 1)[1]

For example:

            gs.GetWorksheetsFeed(key="o06341737111865728099.3585145106901556666").entry[0].
            id.text.rsplit('/', 1)[1]
         

There are two ways to get at the data in a worksheet—either in a ­list-­based way that gets you rows or in a ­cell-­based way that gets you a range of cells. I will show the ­list-­based approach here, which depends on the assumption that the first row is the header row.

For testing purposes, I created a spreadsheet with the header row and one line of data that I entered, as shown in Table 17-1.

Table 17.1. Table 17-1. The Sample Spreadsheet
Date Quantity ?ASIN DetailPageURL Title Author Added Price Desired
1590598385 http://www.amazon.com/ Smart and Gets Things Joel Spolsky 6/5/2007 13.25 1? gp/product/1590598385/ Done: Joel Spolsky’s ? Concise Guide to ? Finding the Best ? Technical Talent ? (Hardcover)

The following returns a feed for the rows (there’s only one):

lfeed = gs.GetListFeed(key="[GSheetID]",wksht_id="[worksheetID]")

For example:

            lfeed =
            gs.GetListFeed(key="o06341737111865728099.3585145106901556666",wksht_id="od6")
         

You can see the content of the row with the following:

lfeed.entry[0].content.text

This results in the following:

            'ASIN: 1590598385, DetailPageURL:
            http://www.amazon.com/gp/product/1590598385/ref=wl_it_dp/103-8266902-5986239?
            ie=UTF8&coliid=I1A0WT8LH796DN&colid=1U5EXVPVS3WP5, Author: Joel Spolsky, Date Added:
            6/5/2007, Price: 13.25, Quantity Desired: 1'
         

The following holds the data that has been mapped from ­namespace-­extended elements in the entry (see http://code.google.com/apis/spreadsheets/developers_?guide_?protocol.? html#listFeedExample):

lfeed.entry[0].custom

Let’s see this in action:

map(lambda e: (e[0],e[1].text), lfeed.entry[0].custom.items())

Running this returns the following:

            [('asin', '1590598385'), ('dateadded', '6/5/2007'), ('detailpageurl',
            'http://www.amazon.com/gp/product/1590598385/ref=wl_it_dp/103-8266902-5986239?
            ie=UTF8&coliid=I1A0WT8LH796DN&colid=1U5EXVPVS3WP5'), ('author', 'Joel
            Spolsky'), ('quantitydesired', '1'), ('price', '13.25'), ('title', "Smart and Gets
            Things Done: Joel Spolsky's Concise Guide to Finding the Best Technical Talent
            (Hardcover) ")]
         

Now let’s look at adding another row of data. Let’s see whether you can just duplicate the row by creating a dictionary of the first row and stick it into the second row:

            h = {}
            for (key,value) in lfeed.entry[0].custom.iteritems():
              h[key] = value.text
         

h now is as follows:

            {'asin': '1590598385', 'dateadded': '6/5/2007', 'detailpageurl':
            'http://www.amazon.com/gp/product/1590598385/ref=wl_it_dp/103-8266902-5986239?
            ie=UTF8&coliid=I1A0WT8LH796DN&colid=1U5EXVPVS3WP5', 'author': 'Joel
            Spolsky', 'quantitydesired': '1', 'price': '13.25', 'title': "Smart and Gets Things
            Done: Joel Spolsky's Concise Guide to Finding the Best Technical Talent (Hardcover)
            "}
         

To add the new row, use this:

gs.InsertRow(row_data=h,key="[GSheetID]",wksht_id="[worksheetID]")

For example:

            gs.InsertRow(row_data=h,key="o06341737111865728099.3585145106901556666",
            wksht_id="od6")
         

To clear the second row you just added, you need to get an update lfeed that reflects the current state of the spreadsheet/worksheet and then issue a delete command:

            lfeed = gs.GetListFeed(key="[GSheetID]",
            wksht_id="[worksheetID]")
            gs.DeleteRow(lfeed.entry[1])
         
[Note]Note

 The Google Spreadsheets API is under active development and is still in the process of maturation.

Mashup: Amazon Wishlist and Google Spreadsheets Mashup

To demonstrate how to use Google Spreadsheets for a simple mashup, I will show you how to write code that will transfer the contents of an Amazon.com wishlist to a Google Spreadsheets spreadsheet. Why do that? I use my wishlist to keep track of books and other stuff that I find interesting. If the wishlist belonged to someone else, I might want to download it into a spreadsheet to make it easier to generate a ­hard-­copy shopping list I could use.

Accessing the Wishlist Through the Amazon.com ECS Web Service

First, a word about how you can use Awszone.com to help you formulate the right Amazon.com ECS query to get the information you want to find. I figured out that I wanted to use the List Lookup query by using this:

http://www.awszone.com/scratchpads/aws/ecs.us/ListLookup.aws

Furthermore, I am using a ListType=WishList and the ListID=1U5EXVPVS3WP5. The URL for web interface to an Amazon.com wishlist is this:

http://www.amazon.com/gp/registry/wishlist/{ListID}/

Substituting your own AccessKeyId, you can get information about the list as a whole:

               http://ecs.amazonaws.com/onca/xml?Service=AWSECommerceService&Version=2007-05-14&
               AWSAccessKeyId=[YourAccessKeyID]&Operation=ListLookup&ListType=WishList&
               ListId=1U5EXVPVS3WP5http://ecs.amazonaws.com/onca/xml?Service=AWSECommerceService&
               Version=2007-10-29&AWSAccessKeyId=[YourAccessKeyID]&Operation=ListLookup&ListType=
               WishList&ListId=1U5EXVPVS3WP5
            

To get a page of the individual items, use the following URL:

               http://ecs.amazonaws.com/onca/xml?Service=AWSECommerceService&Version=2007-10-29&
               AWSAccessKeyId=[YourAccessKeyID]&Operation=ListLookup&ListType=WishList&ListId=1U5
               EXVPVS3WP5&ResponseGroup=ListItems,Medium&ProductPage=2
            

Python Code to Mash Up Amazon.com and Google Spreadsheets

Now you can stitch all of this together with the following code, called amazonWishListtoGSheet.py. (Remember to substitute your own parameters into this code.)

               """
               an example to copy over a public Amazon wishlist to a Google Spreadsheet
               owned by the user based on code at
               http://code.google.com/apis/spreadsheets/developers_guide_python.html
               """
               
               GoogleUser = "[GoogleUSER]"
               GooglePW = "[GooglePASSWORD]"
               GSheet_KEY = "[GSheetID]"
               # GSheet_KEY = "o06341737111865728099.3585145106901556666"
               GWrkSh_ID = "[worksheetID]"
               #GWrkSh_ID = "od6"
               
               AMAZON_LIST_ID = "[LIST_ID_FOR_WISHLIST]"
               # AMAZON_LIST_ID = "1U5EXVPVS3WP5"
               AMAZON_ACCESS_KEY_ID = "[AMAZON_KEY]"
               
               from xml.dom import minidom
               
               import gdata.spreadsheet.service
               
               def getText(nodelist):
                   """
                   convenience function to return all the text in an array of nodes
                   """
                   rc = ""
                   for node in nodelist:
                       if node.nodeType == node.TEXT_NODE:
                           rc = rc + node.data
                   return rc
               
               # a sample row for testing the insertion of a row into the spreadsheet
               GS_Example_Row = {'asin': '1590598385', 'dateadded': '6/5/2007', 'detailpageurl':
               'http://www.amazon.com/gp/product/1590598385/ref=wl_it_dp/103-8266902-5986239?
               ie=UTF8&coliid=I1A0WT8LH796DN&colid=1U5EXVPVS3WP5', 'author': 'Joel Spolsky',
               'quantitydesired': '1', 'price': '13.25', 'title': "Smart and Gets Things Done: Joel
               Spolsky's Concise Guide to Finding the Best Technical Talent (Hardcover) "}
               
               GS_HEADER = ['ASIN', 'DetailPageURL', 'Title', 'Author', 'Date Added', 'Price',
               'Quantity Desired']
               
               GS_KEYS = ['asin', 'detailpageurl', 'title', 'author', 'dateadded', 'price',
               'quantitydesired']
               
               class GSheetForAmazonList:
                   def __init__(self,user=GoogleUser,pwd=GooglePW):
                       gd_client = gdata.spreadsheet.service.SpreadsheetsService()
                       gd_client.email = user
                       gd_client.password = pwd
                       gd_client.source = 'amazonListToGsheet.py'
                       gd_client.ProgrammaticLogin()
                       self.gd_client = gd_client
                   def setKey(self,key):
                       self.key = key
                   def setWkshtId(self,wksht_id):
                       self.wksht_id = wksht_id
                   def listSpreadsheets(self):
                       """
                       return a list with information about the spreadsheets available to the user
                       """
                       sheets = self.gd_client.GetSpreadsheetsFeed()
                       return map(lambda e: (e.title.text , e.id.text.rsplit('/', 1)[1]), 
               sheets.entry)
                   def listWorkSheets(self):
                       wks = self.gd_client.GetWorksheetsFeed(key=self.key)
                       return map(lambda e: (e.title.text , e.id.text.rsplit('/', 1)[1]),wks.entry)
                   def getRows(self):
                       return self.gd_client.GetListFeed(key=self.key,wksht_id=self.wksht_id).entry
                   def insertRow(self,row_data):
                       return self.gd_client.InsertRow(row_data,key=self.key,wksht_id=self.wksht_id)
                   def deleteRow(self,entry):
                       return self.gd_client.DeleteRow(entry)
                   def deleteAllRows(self):
                       entrylist = self.getRows()
                       i = 0
                       for entry in entrylist:
                           self.deleteRow(entry)
                           i += 1
                           print "deleted row ", i
               
               class amazonWishList:
               
               # we can use Python and WSDL
               # http://webservices.amazon.com/AWSECommerceService/AWSECommerceService.wsdl?
               
               # I've been wondering how to introspect using WSDL -- Mark Pilgrim has some answers:
               # http://www.diveintopython.org/soap_web_services/introspection.html
               # well -- the introspection of the input parameters doesn't seem to yield the useful
               # stuff. I was hoping for more info
               
                   def __init__(self,listID=AMAZON_LIST_ID,amazonAccessKeyId=AMAZON_ACCESS_KEY_ID):
                       self.listID = listID
                       self.amazonAccessKeyId = amazonAccessKeyId
                       self.getListInfo()
               
                   def getListInfo(self):
               
                       aws_url = "http://ecs.amazonaws.com/onca/xml?Service=AWSECommerceService&
               Version=2007-10-29&AWSAccessKeyId=%s&Operation=ListLookup&ListType=WishList&ListId
               =%s" % (self.amazonAccessKeyId, self.listID)
                       import urllib
                       f = urllib.urlopen(aws_url)
                       dom = minidom.parse(f)
                       self.title = getText(dom.getElementsByTagName('ListName')[0].childNodes)
                       self.listLength = int(getText(dom.getElementsByTagName('TotalItems')[0].
               childNodes))
                       self.TotalPages = int(getText(dom.getElementsByTagName('TotalPages')[0].
               childNodes))
                       return(self.title, self.listLength, self.TotalPages)
               
                   def ListItems(self):
                       """
                       a generator for the items on the Amazon list
                       """
               
                       import itertools
                       for pageNum in xrange(1,self.TotalPages):
                           aws_url = "http://ecs.amazonaws.com/onca/xml?Service=AWSECommerce
               Service&Version=2007-10-29&AWSAccessKeyId=%s&Operation=ListLookup&ListType=Wish
               List&ListId=%s&ResponseGroup=ListItems,Medium&ProductPage=%s" % (self.amazon
               AccessKeyId,self.listID,pageNum)
                           import urllib
                           f = urllib.urlopen(aws_url)
                           dom = minidom.parse(f)
                           f.close()
                           items = dom.getElementsByTagName('ListItem')
                           for c in xrange(0,10):
                               yield items[c]
               
                   def parseListItem(self,item):
                       from string import join
                       from decimal import Decimal
               
                       itemDict = {}
               
                       itemDict['asin'] = getText(item.getElementsByTagName('ASIN')[0].childNodes)
                       itemDict['dateadded'] = getText(item.getElementsByTagName('DateAdded')[0].
               childNodes)
                       itemDict['detailpageurl'] = getText(item.getElementsByTagName(
               'DetailPageURL')[0].childNodes)
               
                       # join the text of all the author nodes, if they exist
                       authorNodes = item.getElementsByTagName('Author')
                       # blank not allowed
                       itemDict['author'] = join(map(lambda e: getText(e.childNodes), 
               authorNodes), ", ") or ' '
               
                       itemDict['quantitydesired'] = getText(item.getElementsByTagName(
               'QuantityDesired')[0].childNodes)
               
                       titleNodes = item.getElementsByTagName('Title')
                       # blank title not allowed
                       itemDict['title'] = join(map(lambda e: getText(e.childNodes), 
               titleNodes), ", ") or ' '
               
                       # to fix -- not all things have a LowestNewPrice
                       itemDict['price'] = str(Decimal(getText(item.getElementsByTagName(
               'LowestNewPrice')[0].getElementsByTagName('Amount')[0].childNodes))/100) or ' '
               
                       return itemDict
               
               def main():
               
                   gs = GSheetForAmazonList(user=GoogleUser,pwd=GooglePW)
                   gs.setKey(GSheet_KEY)
                   gs.setWkshtId(GWrkSh_ID)
               
                   aWishList = amazonWishList(listID=AMAZON_LIST_ID,amazonAccessKeyId=
               AMAZON_ACCESS_KEY_ID)
                   items = aWishList.ListItems()
                   print "deleting all rows..."
                   gs.deleteAllRows()
                   for item in items:
                       try:
                           h = aWishList.parseListItem(item)
                           print h['asin']
                       except Exception, e:
                           print "Error %s parsing %s" % (e, item.toprettyxml("  "))
                       try:
                           gs.insertRow(h)
                       except Exception, e:
                           print "Error %s inserting %s" % (e, h['asin'])
               
               if __name__ == '__main__':
                   main()
            

Here are some things to note about this code:

  • The GSheetForAmazonList class provides convenience methods for the Google GData library.

  • The error handling is essential since not all wishlist items necessarily have all the pieces of data requested. It’s important for the code to keep moving even if data is missing.

  • At least in the Python GData interface to Google Spreadsheets, you can’t insert blank cells.

  • amazonWishList.ListItems is a Python generator, which creates an iterator to parcel out the Amazon items one at a time. See http://www.ibm.com/developerworks/library/?l- pycon.html?t=gr,lnxw16=PyIntro for a tutorial on Python generators.

  • You can speed up the operation of this code through batch operations (http://code.?google.?com/?apis/gdata/batch.html), which are currently supported in the GData interface and in the Java API kit (but not Python).

Zend PHP API Kit for Google Spreadsheets

In this section, I’ll show you how to use the PHP API kit for Google Spreadsheets to read the contents of the Google Spreadsheets that we’ll generate in the previous mashup. You can download the Zend Framework from here:

http://framework.zend.com/

and read about how to use the Zend Framework to access Google Spreadsheet here:

http://framework.zend.com/manual/en/zend.gdata.spreadsheets.html

and here:

http://code.google.com/apis/spreadsheets/developers_guide_php.html

The following code first lists your spreadsheets and then the rows and named columns of the spreadsheet containing items from your Amazon.com wishlist:

            <?php
            
            # user and password for google spreadsheet
            $user = "[GoogleUSER]";
            $pass = "[GooglePASSWORD]";
            
            # set parameters for your version of "My Amazon WishList" Google Spreadsheet
            $GSheetID = "[GSheetID]";
            $worksheetID="[worksheetID]";
            #$GSheetID = "o06341737111865728099.3585145106901556666";
            #$worksheetID="od6";
            
            # list entries from a spreadsheet
            
            require_once('Zend/Loader.php');
            Zend_Loader::loadClass('Zend_Gdata');
            Zend_Loader::loadClass('Zend_Gdata_ClientLogin');
            Zend_Loader::loadClass('Zend_Gdata_Spreadsheets');
            Zend_Loader::loadClass('Zend_Http_Client');
            
            $service = Zend_Gdata_Spreadsheets::AUTH_SERVICE_NAME;
            $client = Zend_Gdata_ClientLogin::getHttpClient($user, $pass, $service);
            $spreadsheetService = new Zend_Gdata_Spreadsheets($client);
            
            #  the following printFeed shows how to parse various types of feeds
            #  coming from Google Spreadsheets API
            #  function is extracted from
            #  http://code.google.com/apis/spreadsheets/developers_guide_php.html
            
            function printFeed($feed)
            {
              $i = 0;
              foreach($feed->entries as $entry) {
                if ($entry instanceof Zend_Gdata_Spreadsheets_CellEntry) {
                  print $entry->title->text .' '. $entry->content->text . "\n";
                } else if ($entry instanceof Zend_Gdata_Spreadsheets_ListEntry) {
                  print $i .' '. $entry->title->text .' '. $entry->content->text . "\n";
                } else {
                  print $i .' '. $entry->title->text . "\n";
                }
                $i++;
              }
            }
            
            # figuring out how to print rows
            
            function printWorksheetFeed($feed)
            {
              $i = 2;  # the first row of content is row 2
              foreach($feed->entries as $row) {
                print "Row " . $i .' '. $row->title->text . "\t";
                $i++;
                $rowData = $row->getCustom();
                foreach($rowData as $customEntry) {
                  print $customEntry->getColumnName() . " = " . $customEntry->getText(). "\t";
                }
                print "\n";
              }
            }
            
            # first print a list of your Google Spreadsheets
            
            $feed = $spreadsheetService->getSpreadsheetFeed();
            printFeed($feed);
            
            # Print the content of a specific Spreadsheet/Worksheet
            # set a query to return a worksheet and print the contents of the worksheet
            
            $query = new Zend_Gdata_Spreadsheets_ListQuery();
            $query->setSpreadsheetKey($GSheetID);
            $query->setWorksheetId($worksheetID);
            $listFeed = $spreadsheetService->getListFeed($query);
            printWorksheetFeed($listFeed);
            
            ?>
         

A Final Variation: Amazon Wishlist to Microsoft Excel via COM

With code to access the Amazon.com wishlist in hand, you can use COM programming to generate an Excel spreadsheet with the same information. To learn more about the details about how to do so, consult Chapter 12 of Python Programming on Win32.[303]

            from amazonListToGSheet import GS_HEADER, amazonWishList, AMAZON_LIST_ID, 
            AMAZON_ACCESS_KEY_ID, GS_KEYS
            from win32com.client import Dispatch
            
            # fire up the Excel application
            xlApp = Dispatch("Excel.Application")
            xlApp.Visible = 1
            xlApp.Workbooks.Add()
            
            # write the headers
            col = 1
            
            def insertRow(sheet,row,data,keys):
                col = 1
                for k in keys:
                    sheet.Cells(row,col).Value = data[k]
                    col += 1
            
            for h in GS_HEADER:
                xlApp.ActiveSheet.Cells(1,col).Value = h
                col +=1
            # now loop through the amazon wishlist
            
            aWishList =
            amazonWishList(listID=AMAZON_LIST_ID,amazonAccessKeyId=AMAZON_ACCESS_KEY_ID)
            items = aWishList.ListItems()
            
            row = 2
            for item in items:
                try:
                    p = aWishList.parseListItem(item)
                    print p['asin']
                except Exception, e:
                    print "Error %s parsing %s" % (e, item.toprettyxml("  "))
                try:
                    insertRow(xlApp.ActiveSheet,row,p,GS_KEYS)
                    row += 1
                except Exception, e:
                    print "Error %s inserting %s" % (e, p['asin'])