Let’s figure out how to use the Google Spreadsheets API, focusing specifically on PHP and Python wrapper libraries. You can also directly manipulate the feed protocol:
http://code.google.com/apis/spreadsheets/developers_guide_protocol.html
There are also a number of API kits available for Google Spreadsheets. I will first demonstrate how to use the Python API kit by creating a mashup involving the Amazon.com web services. I’ll then show you a simple example of using the Zend PHP API kit to read the spreadsheets generated in the mashup.
Google provides a Python GData library and sample code to access the Google
spreadsheet. You can either download specific releases (from http://code.google.com/p/gdata-python-client/downloads/list
) or
access the SVN repository:
svn checkout http://gdata-python-client.googlecode.com/svn/trunk/ gdata-python-client
Note the dependencies on other libraries, especially ElementTree, which was not part of the standard Python libraries until version 2.5.[302]
I highly recommend reading the documentation on the Google site specific to the Python library:
http://code.google.com/apis/spreadsheets/developers_guide_python.html
Once you have the Python GData library installed, you can try some code samples, using the Python interpreter, to teach yourself how it works. First here are the obligatory imports:
import gdata.spreadsheet.service
Let’s then declare some convenience functions and variables:
GoogleUser = "[your Google email address]" GooglePW = "[your password]"
Define the following convenience function:
def GSheetService(user,pwd): gd_client = gdata.spreadsheet.service.SpreadsheetsService() gd_client.email = user gd_client.password = pwd gd_client.source = 'amazonWishListToGSheet.py' gd_client.ProgrammaticLogin() return gd_client
Instantiate a Google data client for your spreadsheet:
gs = GSheetService(GoogleUser,GooglePW) sheets = gs.GetSpreadsheetsFeed()
To get a list of the spreadsheets, their titles, and their IDs, use this:
print map(lambda e: e.title.text + " : " + e.id.text.rsplit('/', 1)[1],sheets.entry)
This yields something like the following (which is based on my own spreadsheets):
['My Amazon WishList : o06341737111865728099.3585145106901556666', 'Udell Mini- Symposium May 1, 2007 : o06341737111865728099.1877210150658854761', 'weight.journal : o06341737111865728099.6289501454054682788', 'Plan : o10640374522570553588. 5762564240835257179']
Note the key for the spreadsheet “My Amazon WishList.” The spreadsheet that the code I write here will be reading from and writing to is as follows:
o06341737111865728099.3585145106901556666
You will need to create your own Google spreadsheets to work with since you won’t be able to write to mine. Note the ID of your spreadsheet, which you will use here.
In the browser, if I’m logged in as the owner of the spreadsheet, I can access this:
http://spreadsheets.google.com/feeds/spreadsheets/private/full/{GSheetID}
For example:
http://spreadsheets.google.com/feeds/spreadsheets/private/full/o0634173711186572809 .3585145106901556666
Otherwise, I get a 404 error. Now I need to get the ID of the one worksheet in the “My Amazon Wishlist” spreadsheet. First use this:
gs.GetWorksheetsFeed(key="[GSheetID]").entry[0].id.text
For example, I use this:
gs.GetWorksheetsFeed(key="o06341737111865728099.3585145106901556666").entry[0]. id.text
to return the URL whose last segment is a worksheet ID—that is, a URL of the following form:
http://spreadsheets.google.com/feeds/worksheets/{GSheetID}/private/full/ {worksheetID}
For example:
http://spreadsheets.google.com/feeds/worksheets/o06341737111865728099.35851451069015 56666/private/full/od6
(in which case the worksheet ID is od6
).
Now you can get the worksheet ID:
gs.GetWorksheetsFeed(key="[GSheetID]").entry[0].id.text.rsplit('/', 1)[1]
For example:
gs.GetWorksheetsFeed(key="o06341737111865728099.3585145106901556666").entry[0]. id.text.rsplit('/', 1)[1]
There are two ways to get at the data in a worksheet—either in a list-based way that gets you rows or in a cell-based way that gets you a range of cells. I will show the list-based approach here, which depends on the assumption that the first row is the header row.
For testing purposes, I created a spreadsheet with the header row and one line of data that I entered, as shown in Table 17-1.
Date | Quantity ?ASIN | DetailPageURL | Title | Author | Added | Price | Desired | |||||
1590598385 http://www.amazon.com/ Smart and Gets Things Joel Spolsky 6/5/2007 13.25 1? gp/product/1590598385/ Done: Joel Spolsky’s ? Concise Guide to ? Finding the Best ? Technical Talent ? (Hardcover) |
The following returns a feed for the rows (there’s only one):
lfeed = gs.GetListFeed(key="[GSheetID]",wksht_id="[worksheetID]")
For example:
lfeed = gs.GetListFeed(key="o06341737111865728099.3585145106901556666",wksht_id="od6")
You can see the content of the row with the following:
lfeed.entry[0].content.text
This results in the following:
'ASIN: 1590598385, DetailPageURL: http://www.amazon.com/gp/product/1590598385/ref=wl_it_dp/103-8266902-5986239? ie=UTF8&coliid=I1A0WT8LH796DN&colid=1U5EXVPVS3WP5, Author: Joel Spolsky, Date Added: 6/5/2007, Price: 13.25, Quantity Desired: 1'
The following holds the data that has been mapped from namespace-extended
elements in the entry (see http://code.google.com/apis/spreadsheets/developers_?guide_?protocol.?
html#listFeedExample
):
lfeed.entry[0].custom
Let’s see this in action:
map(lambda e: (e[0],e[1].text), lfeed.entry[0].custom.items())
Running this returns the following:
[('asin', '1590598385'), ('dateadded', '6/5/2007'), ('detailpageurl', 'http://www.amazon.com/gp/product/1590598385/ref=wl_it_dp/103-8266902-5986239? ie=UTF8&coliid=I1A0WT8LH796DN&colid=1U5EXVPVS3WP5'), ('author', 'Joel Spolsky'), ('quantitydesired', '1'), ('price', '13.25'), ('title', "Smart and Gets Things Done: Joel Spolsky's Concise Guide to Finding the Best Technical Talent (Hardcover) ")]
Now let’s look at adding another row of data. Let’s see whether you can just duplicate the row by creating a dictionary of the first row and stick it into the second row:
h = {} for (key,value) in lfeed.entry[0].custom.iteritems(): h[key] = value.text
h
now is as follows:
{'asin': '1590598385', 'dateadded': '6/5/2007', 'detailpageurl': 'http://www.amazon.com/gp/product/1590598385/ref=wl_it_dp/103-8266902-5986239? ie=UTF8&coliid=I1A0WT8LH796DN&colid=1U5EXVPVS3WP5', 'author': 'Joel Spolsky', 'quantitydesired': '1', 'price': '13.25', 'title': "Smart and Gets Things Done: Joel Spolsky's Concise Guide to Finding the Best Technical Talent (Hardcover) "}
To add the new row, use this:
gs.InsertRow(row_data=h,key="[GSheetID]",wksht_id="[worksheetID]")
For example:
gs.InsertRow(row_data=h,key="o06341737111865728099.3585145106901556666", wksht_id="od6")
To clear the second row you just added, you need to get an update lfeed
that reflects the current state of the spreadsheet/worksheet and then issue a delete
command:
lfeed = gs.GetListFeed(key="[GSheetID]", wksht_id="[worksheetID]") gs.DeleteRow(lfeed.entry[1])
Note | |
---|---|
The Google Spreadsheets API is under active development and is still in the process of maturation. |
To demonstrate how to use Google Spreadsheets for a simple mashup, I will show you how to write code that will transfer the contents of an Amazon.com wishlist to a Google Spreadsheets spreadsheet. Why do that? I use my wishlist to keep track of books and other stuff that I find interesting. If the wishlist belonged to someone else, I might want to download it into a spreadsheet to make it easier to generate a hard-copy shopping list I could use.
First, a word about how you can use Awszone.com to help you formulate the right
Amazon.com ECS query to get the information you want to find. I figured out that I
wanted to use the List
Lookup
query by using this:
http://www.awszone.com/scratchpads/aws/ecs.us/ListLookup.aws
Furthermore, I am using a ListType=WishList
and the
ListID=1U5EXVPVS3WP5
. The URL for web interface to an Amazon.com
wishlist is this:
http://www.amazon.com/gp/registry/wishlist/{ListID}/
Substituting your own AccessKeyId
, you can get information about the
list as a whole:
http://ecs.amazonaws.com/onca/xml?Service=AWSECommerceService&Version=2007-05-14& AWSAccessKeyId=[YourAccessKeyID]&Operation=ListLookup&ListType=WishList& ListId=1U5EXVPVS3WP5http://ecs.amazonaws.com/onca/xml?Service=AWSECommerceService& Version=2007-10-29&AWSAccessKeyId=[YourAccessKeyID]&Operation=ListLookup&ListType= WishList&ListId=1U5EXVPVS3WP5
To get a page of the individual items, use the following URL:
http://ecs.amazonaws.com/onca/xml?Service=AWSECommerceService&Version=2007-10-29& AWSAccessKeyId=[YourAccessKeyID]&Operation=ListLookup&ListType=WishList&ListId=1U5 EXVPVS3WP5&ResponseGroup=ListItems,Medium&ProductPage=2
Now you can stitch all of this together with the following code, called
amazonWishListtoGSheet.py
. (Remember to substitute your own
parameters into this code.)
""" an example to copy over a public Amazon wishlist to a Google Spreadsheet owned by the user based on code at http://code.google.com/apis/spreadsheets/developers_guide_python.html """ GoogleUser = "[GoogleUSER]" GooglePW = "[GooglePASSWORD]" GSheet_KEY = "[GSheetID]" # GSheet_KEY = "o06341737111865728099.3585145106901556666" GWrkSh_ID = "[worksheetID]" #GWrkSh_ID = "od6" AMAZON_LIST_ID = "[LIST_ID_FOR_WISHLIST]" # AMAZON_LIST_ID = "1U5EXVPVS3WP5" AMAZON_ACCESS_KEY_ID = "[AMAZON_KEY]" from xml.dom import minidom import gdata.spreadsheet.service def getText(nodelist): """ convenience function to return all the text in an array of nodes """ rc = "" for node in nodelist: if node.nodeType == node.TEXT_NODE: rc = rc + node.data return rc # a sample row for testing the insertion of a row into the spreadsheet GS_Example_Row = {'asin': '1590598385', 'dateadded': '6/5/2007', 'detailpageurl': 'http://www.amazon.com/gp/product/1590598385/ref=wl_it_dp/103-8266902-5986239? ie=UTF8&coliid=I1A0WT8LH796DN&colid=1U5EXVPVS3WP5', 'author': 'Joel Spolsky', 'quantitydesired': '1', 'price': '13.25', 'title': "Smart and Gets Things Done: Joel Spolsky's Concise Guide to Finding the Best Technical Talent (Hardcover) "} GS_HEADER = ['ASIN', 'DetailPageURL', 'Title', 'Author', 'Date Added', 'Price', 'Quantity Desired'] GS_KEYS = ['asin', 'detailpageurl', 'title', 'author', 'dateadded', 'price', 'quantitydesired'] class GSheetForAmazonList: def __init__(self,user=GoogleUser,pwd=GooglePW): gd_client = gdata.spreadsheet.service.SpreadsheetsService() gd_client.email = user gd_client.password = pwd gd_client.source = 'amazonListToGsheet.py' gd_client.ProgrammaticLogin() self.gd_client = gd_client def setKey(self,key): self.key = key def setWkshtId(self,wksht_id): self.wksht_id = wksht_id def listSpreadsheets(self): """ return a list with information about the spreadsheets available to the user """ sheets = self.gd_client.GetSpreadsheetsFeed() return map(lambda e: (e.title.text , e.id.text.rsplit('/', 1)[1]), sheets.entry) def listWorkSheets(self): wks = self.gd_client.GetWorksheetsFeed(key=self.key) return map(lambda e: (e.title.text , e.id.text.rsplit('/', 1)[1]),wks.entry) def getRows(self): return self.gd_client.GetListFeed(key=self.key,wksht_id=self.wksht_id).entry def insertRow(self,row_data): return self.gd_client.InsertRow(row_data,key=self.key,wksht_id=self.wksht_id) def deleteRow(self,entry): return self.gd_client.DeleteRow(entry) def deleteAllRows(self): entrylist = self.getRows() i = 0 for entry in entrylist: self.deleteRow(entry) i += 1 print "deleted row ", i class amazonWishList: # we can use Python and WSDL # http://webservices.amazon.com/AWSECommerceService/AWSECommerceService.wsdl? # I've been wondering how to introspect using WSDL -- Mark Pilgrim has some answers: # http://www.diveintopython.org/soap_web_services/introspection.html # well -- the introspection of the input parameters doesn't seem to yield the useful # stuff. I was hoping for more info def __init__(self,listID=AMAZON_LIST_ID,amazonAccessKeyId=AMAZON_ACCESS_KEY_ID): self.listID = listID self.amazonAccessKeyId = amazonAccessKeyId self.getListInfo() def getListInfo(self): aws_url = "http://ecs.amazonaws.com/onca/xml?Service=AWSECommerceService& Version=2007-10-29&AWSAccessKeyId=%s&Operation=ListLookup&ListType=WishList&ListId =%s" % (self.amazonAccessKeyId, self.listID) import urllib f = urllib.urlopen(aws_url) dom = minidom.parse(f) self.title = getText(dom.getElementsByTagName('ListName')[0].childNodes) self.listLength = int(getText(dom.getElementsByTagName('TotalItems')[0]. childNodes)) self.TotalPages = int(getText(dom.getElementsByTagName('TotalPages')[0]. childNodes)) return(self.title, self.listLength, self.TotalPages) def ListItems(self): """ a generator for the items on the Amazon list """ import itertools for pageNum in xrange(1,self.TotalPages): aws_url = "http://ecs.amazonaws.com/onca/xml?Service=AWSECommerce Service&Version=2007-10-29&AWSAccessKeyId=%s&Operation=ListLookup&ListType=Wish List&ListId=%s&ResponseGroup=ListItems,Medium&ProductPage=%s" % (self.amazon AccessKeyId,self.listID,pageNum) import urllib f = urllib.urlopen(aws_url) dom = minidom.parse(f) f.close() items = dom.getElementsByTagName('ListItem') for c in xrange(0,10): yield items[c] def parseListItem(self,item): from string import join from decimal import Decimal itemDict = {} itemDict['asin'] = getText(item.getElementsByTagName('ASIN')[0].childNodes) itemDict['dateadded'] = getText(item.getElementsByTagName('DateAdded')[0]. childNodes) itemDict['detailpageurl'] = getText(item.getElementsByTagName( 'DetailPageURL')[0].childNodes) # join the text of all the author nodes, if they exist authorNodes = item.getElementsByTagName('Author') # blank not allowed itemDict['author'] = join(map(lambda e: getText(e.childNodes), authorNodes), ", ") or ' ' itemDict['quantitydesired'] = getText(item.getElementsByTagName( 'QuantityDesired')[0].childNodes) titleNodes = item.getElementsByTagName('Title') # blank title not allowed itemDict['title'] = join(map(lambda e: getText(e.childNodes), titleNodes), ", ") or ' ' # to fix -- not all things have a LowestNewPrice itemDict['price'] = str(Decimal(getText(item.getElementsByTagName( 'LowestNewPrice')[0].getElementsByTagName('Amount')[0].childNodes))/100) or ' ' return itemDict def main(): gs = GSheetForAmazonList(user=GoogleUser,pwd=GooglePW) gs.setKey(GSheet_KEY) gs.setWkshtId(GWrkSh_ID) aWishList = amazonWishList(listID=AMAZON_LIST_ID,amazonAccessKeyId= AMAZON_ACCESS_KEY_ID) items = aWishList.ListItems() print "deleting all rows..." gs.deleteAllRows() for item in items: try: h = aWishList.parseListItem(item) print h['asin'] except Exception, e: print "Error %s parsing %s" % (e, item.toprettyxml(" ")) try: gs.insertRow(h) except Exception, e: print "Error %s inserting %s" % (e, h['asin']) if __name__ == '__main__': main()
Here are some things to note about this code:
The GSheetForAmazonList
class provides convenience methods for
the Google GData library.
The error handling is essential since not all wishlist items necessarily have all the pieces of data requested. It’s important for the code to keep moving even if data is missing.
At least in the Python GData interface to Google Spreadsheets, you can’t insert blank cells.
amazonWishList.ListItems
is a Python generator, which creates an
iterator to parcel out the Amazon items one at a time. See http://www.ibm.com/developerworks/library/?l-
pycon.html?t=gr,lnxw16=PyIntro
for a tutorial on Python
generators.
You can speed up the operation of this code through batch operations
(http://code.?google.?com/?apis/gdata/batch.html
), which
are currently supported in the GData interface and in the Java API kit (but not
Python).
In this section, I’ll show you how to use the PHP API kit for Google Spreadsheets to read the contents of the Google Spreadsheets that we’ll generate in the previous mashup. You can download the Zend Framework from here:
and read about how to use the Zend Framework to access Google Spreadsheet here:
http://framework.zend.com/manual/en/zend.gdata.spreadsheets.html
and here:
http://code.google.com/apis/spreadsheets/developers_guide_php.html
The following code first lists your spreadsheets and then the rows and named columns of the spreadsheet containing items from your Amazon.com wishlist:
<?php # user and password for google spreadsheet $user = "[GoogleUSER]"; $pass = "[GooglePASSWORD]"; # set parameters for your version of "My Amazon WishList" Google Spreadsheet $GSheetID = "[GSheetID]"; $worksheetID="[worksheetID]"; #$GSheetID = "o06341737111865728099.3585145106901556666"; #$worksheetID="od6"; # list entries from a spreadsheet require_once('Zend/Loader.php'); Zend_Loader::loadClass('Zend_Gdata'); Zend_Loader::loadClass('Zend_Gdata_ClientLogin'); Zend_Loader::loadClass('Zend_Gdata_Spreadsheets'); Zend_Loader::loadClass('Zend_Http_Client'); $service = Zend_Gdata_Spreadsheets::AUTH_SERVICE_NAME; $client = Zend_Gdata_ClientLogin::getHttpClient($user, $pass, $service); $spreadsheetService = new Zend_Gdata_Spreadsheets($client); # the following printFeed shows how to parse various types of feeds # coming from Google Spreadsheets API # function is extracted from # http://code.google.com/apis/spreadsheets/developers_guide_php.html function printFeed($feed) { $i = 0; foreach($feed->entries as $entry) { if ($entry instanceof Zend_Gdata_Spreadsheets_CellEntry) { print $entry->title->text .' '. $entry->content->text . "\n"; } else if ($entry instanceof Zend_Gdata_Spreadsheets_ListEntry) { print $i .' '. $entry->title->text .' '. $entry->content->text . "\n"; } else { print $i .' '. $entry->title->text . "\n"; } $i++; } } # figuring out how to print rows function printWorksheetFeed($feed) { $i = 2; # the first row of content is row 2 foreach($feed->entries as $row) { print "Row " . $i .' '. $row->title->text . "\t"; $i++; $rowData = $row->getCustom(); foreach($rowData as $customEntry) { print $customEntry->getColumnName() . " = " . $customEntry->getText(). "\t"; } print "\n"; } } # first print a list of your Google Spreadsheets $feed = $spreadsheetService->getSpreadsheetFeed(); printFeed($feed); # Print the content of a specific Spreadsheet/Worksheet # set a query to return a worksheet and print the contents of the worksheet $query = new Zend_Gdata_Spreadsheets_ListQuery(); $query->setSpreadsheetKey($GSheetID); $query->setWorksheetId($worksheetID); $listFeed = $spreadsheetService->getListFeed($query); printWorksheetFeed($listFeed); ?>
With code to access the Amazon.com wishlist in hand, you can use COM programming to generate an Excel spreadsheet with the same information. To learn more about the details about how to do so, consult Chapter 12 of Python Programming on Win32.[303]
from amazonListToGSheet import GS_HEADER, amazonWishList, AMAZON_LIST_ID, AMAZON_ACCESS_KEY_ID, GS_KEYS from win32com.client import Dispatch # fire up the Excel application xlApp = Dispatch("Excel.Application") xlApp.Visible = 1 xlApp.Workbooks.Add() # write the headers col = 1 def insertRow(sheet,row,data,keys): col = 1 for k in keys: sheet.Cells(row,col).Value = data[k] col += 1 for h in GS_HEADER: xlApp.ActiveSheet.Cells(1,col).Value = h col +=1 # now loop through the amazon wishlist aWishList = amazonWishList(listID=AMAZON_LIST_ID,amazonAccessKeyId=AMAZON_ACCESS_KEY_ID) items = aWishList.ListItems() row = 2 for item in items: try: p = aWishList.parseListItem(item) print p['asin'] except Exception, e: print "Error %s parsing %s" % (e, item.toprettyxml(" ")) try: insertRow(xlApp.ActiveSheet,row,p,GS_KEYS) row += 1 except Exception, e: print "Error %s inserting %s" % (e, p['asin'])