Chapter 17. Mashing Up Desktop and ­Web-Based Office Suites

Table of Contents

Mashup Scenarios for Office Suites
The World of Document Markup
The OpenDocument Format
Learning Basic ODF Tags
Create an ODF Text Document Without Any Styling of ODF Elements
Setting the Paragraph Text to ­text-­body
Formatting Lists to Distinguish Between Ordered and Unordered Lists
Getting Bold, Italics, Font Changes, and Color Changes into Text Spans
API Kits for Working with ODF
Leveraging OO.o to Generate ODF
ECMA Office Open XML (OOXML)
Viewers/Validators for OOXML
Comparing ODF and OOXML
Online Office Suites
Usage Scenarios for Programmable Online Spreadsheets
Google Spreadsheets API
Python API Kit
Mashup: Amazon Wishlist and Google Spreadsheets Mashup
Zend PHP API Kit for Google Spreadsheets
A Final Variation: Amazon Wishlist to Microsoft Excel via COM
Zoho APIs

I’ve long been excited about the mashability and reusability of office suite documents (for example, word processor documents, spreadsheets, and slide presentations), the potential of which has gone largely unexploited. There are many office suites, but in this chapter I’ll concentrate on the latest versions of, often called OO.o (version 2.x), and Microsoft Office (2007 and 2003). Few people realize that both these applications not only have programming interfaces but also have ­XML-­based file formats. In theory, office documents using the respective file formats (OpenDocument and Office Open XML) are easier to reuse and generate from scratch than older generations of documents using opaque binary formats. And as you have seen throughout the book, knowledge of data formats and APIs means having opportunities for mashups. For ages, people have been reverse engineering older Microsoft Office documents, whose formats were not publicly documented; however, recombining office suites has been made easier, though not effortless, by these new formats. In this chapter, I will also introduce you to the emerging space of ­web-­based office suites, specifically ones that are programmable. I’ll also briefly cover how to program the office suites.

This chapter does the following:

Mashup Scenarios for Office Suites

Why would mashups of office suite documents be interesting? For one, word processing documents, spreadsheets, and even presentation files hold vast amounts of the information that we communicate to each other. Sometimes they are in narratives (such as documents), and sometimes they are in semistructured forms (such as spreadsheets). To reuse that information, it is sometimes a matter of reformatting a document into another format. Other times, it’s about extracting valuable pieces; for instance, all the references in a word processor document might be extracted into a reference database. Furthermore, not only does knowledge of the file formats enable you to parse documents, but it allows you to generate documents.

Some use case scenarios for the programmatic creation and reuse of office documents include the following:

Reusing ? PowerPoint: Do you have collections of Microsoft PowerPoint presentations that draw from a common collection of digital assets (pictures and outlines) and complete slides? Can you build a system of personal information management so that PPT presentations are constructed as virtual assemblages of slides, dynamically associated with assets?

Writing once, ? publishing everywhere: I’m currently writing this manuscript in Microsoft Office 2007. I’d like to republish this book in (X)HTML, Docbook, PDF, and wiki markup. How would I repurpose the Microsoft Word manuscript into those formats?

Transforming ? data: You could create an educational website in which data is downloaded to spreadsheets, not only as static data elements but as dynamic simulations. There’s plenty of data out there. Can you write programs to translate it into the dominant data analysis tool used by everyone, which is spreadsheets, whether it is on the desktop or in the cloud?

Getting instant ? PowerPoint presentations from Flickr: I’d like to download a Flickr set as a PowerPoint presentation. (This scenario seems to fit a world in which PowerPoint is the dominant presentation program. Even if Tufte hates it, a ­Flickr-­to-PPT translator might make it easier to show those vacation pictures at your next company presentation.)

There are many other possibilities. This chapter teaches you what you need to know to start building such applications.