Table of Contents
I’ve long been excited about the mashability and reusability of office suite documents (for example, word processor documents, spreadsheets, and slide presentations), the potential of which has gone largely unexploited. There are many office suites, but in this chapter I’ll concentrate on the latest versions of OpenOffice.org, often called OO.o (version 2.x), and Microsoft Office (2007 and 2003). Few people realize that both these applications not only have programming interfaces but also have XML-based file formats. In theory, office documents using the respective file formats (OpenDocument and Office Open XML) are easier to reuse and generate from scratch than older generations of documents using opaque binary formats. And as you have seen throughout the book, knowledge of data formats and APIs means having opportunities for mashups. For ages, people have been reverse engineering older Microsoft Office documents, whose formats were not publicly documented; however, recombining office suites has been made easier, though not effortless, by these new formats. In this chapter, I will also introduce you to the emerging space of web-based office suites, specifically ones that are programmable. I’ll also briefly cover how to program the office suites.
This chapter does the following:
Shows how to do some simple parsing of the OpenDocument format (ODF) and Office Open XML documents
Shows how to create a simple document in both ODF and Open XML
Demonstrates some simple scripting of OO.o and Microsoft Office
Lays out what else is possible by manipulating the open document formats
Shows how to program Google Spreadsheets and to mash it up with other APIs (such as Amazon E-Commerce Services)
Why would mashups of office suite documents be interesting? For one, word processing documents, spreadsheets, and even presentation files hold vast amounts of the information that we communicate to each other. Sometimes they are in narratives (such as documents), and sometimes they are in semistructured forms (such as spreadsheets). To reuse that information, it is sometimes a matter of reformatting a document into another format. Other times, it’s about extracting valuable pieces; for instance, all the references in a word processor document might be extracted into a reference database. Furthermore, not only does knowledge of the file formats enable you to parse documents, but it allows you to generate documents.
Some use case scenarios for the programmatic creation and reuse of office documents include the following:
Reusing ? PowerPoint: Do you have collections of Microsoft PowerPoint presentations that draw from a common collection of digital assets (pictures and outlines) and complete slides? Can you build a system of personal information management so that PPT presentations are constructed as virtual assemblages of slides, dynamically associated with assets?
Writing once, ? publishing everywhere: I’m currently writing this manuscript in Microsoft Office 2007. I’d like to republish this book in (X)HTML, Docbook, PDF, and wiki markup. How would I repurpose the Microsoft Word manuscript into those formats?
Transforming ? data: You could create an educational website in which data is downloaded to spreadsheets, not only as static data elements but as dynamic simulations. There’s plenty of data out there. Can you write programs to translate it into the dominant data analysis tool used by everyone, which is spreadsheets, whether it is on the desktop or in the cloud?
Getting instant ? PowerPoint presentations from Flickr: I’d like to download a Flickr set as a PowerPoint presentation. (This scenario seems to fit a world in which PowerPoint is the dominant presentation program. Even if Tufte hates it, a Flickr-to-PPT translator might make it easier to show those vacation pictures at your next company presentation.)
There are many other possibilities. This chapter teaches you what you need to know to start building such applications.