ECMA Office Open XML (OOXML)

Now we turn to a competing file format: Office Open XML. Wikpedia provides a good overview of the specification that underpins Microsoft Office 2007:

http://en.wikipedia.org/wiki/Office_Open_XML

The Office Open XML specification has been made into an ECMA standard (ECMA-376). You can find the specification here:

http://www.ecma-international.org/publications/standards/Ecma-376.htm

Note that the standard is 6,000 pages—in case you want to read it. ECMA provides an overview white paper:

http://www.ecma-international.org/news/TC45_current_work/OpenXML%20White%20Paper.pdf

Getting hard, ­easy-­to-digest information on OOXML is challenging. I recommend the following, more colloquial overviews that you might find useful:

When working with Office Open XML, it’s good to heed the following warning: “Open XML is a new standard. So new, in fact, that the schemas are still being edited and haven’t been published by ECMA yet. And there are no books out on Open XML development, although that will surely change in the next year.”[298]

The Office Open XML format has a predecessor in the Microsoft Office 2003 XML format. In the book Office 2003 XML (O’Reilly Media, 2004), the following was given as a minimalist Office 2003 XML document:[299]

         <?xml version="1.0"?>
         <?mso-application progid="Word.Document"?>
         <w:wordDocument
           xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">
           <w:body>
             <w:p>
               <w:r>
                 <w:t>Hello, World!</w:t>
               </w:r>
             </w:p>
           </w:body>
         </w:wordDocument>
      

This document is actually readable by Microsoft Office 2007, though in “compatibility mode.” Can you get a valid document by using the Microsoft Office 2003 document and updating the namespace of the document? That is, can you just update the namespace for w?

xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"

You therefore have the following:

         <?xml version="1.0"?>
         <?mso-application progid="Word.Document"?>
         <w:wordDocument
           xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
           <w:body>
             <w:p>
               <w:r>
                 <w:t>Hello, World!</w:t>
               </w:r>
             </w:p>
           </w:body>
         </w:wordDocument>
      

Unfortunately this document as an Office Open XML instance to Microsoft Office 2007 causes an error. You can certainly keep pushing in this direction by looking through the specification and schema. However, a more promising lead right now is to see what file gets written out by a simple little C# script aimed at generating a simple .docx file:

http://blogs.msdn.com/dmahugh/archive/2006/06/27/649007.aspx

I downloaded the Microsoft Visual Studio C# Express Edition to run the script and made a small change to update the namespace from this:

http://schemas.openxmlformats.org/wordprocessingml/2006/3/main

to this:

http://schemas.openxmlformats.org/wordprocessingml/2006/main

With that change, you can generate a simple Office Open XML document file (http://?examples.mashupguide.net/ch17/helloworld_simple.1.docx) that is acceptable by Microsoft Office 2007. (This doesn’t prove that the file is valid but only that you are on the right track in terms of generating OOXML.)

Unzipping and studying the file gives you insight into what goes into a minimalist instance of OOXML. The list of files is as follows:

         File Name                                             Modified             Size
         word/document.xml                              2007-06-04 16:43:44          246
         [Content_Types].xml                            2007-06-04 16:43:44          346
         _rels/.rels                                    2007-06-04 16:43:44          285
      

Let’s look at the individual files. The first is the document.xml file in the word directory, which holds the content of the document and corresponds most closely to content.xml in ODF.

         <?xml version="1.0" encoding="utf-8"?>
         <w:document
             xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
           <w:body>
             <w:p>
               <w:r>
                 <w:t>Hello World!</w:t>
               </w:r>
             </w:p>
           </w:body>
         </w:document>
      

The .rels file in the rels directory contains information about relationships among the various files that make up the package of files (a bit like the METAINF/meta.xml file in ODF):

         <?xml version="1.0" ?>
         <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
           <Relationship Id="rId1" Target="/word/document.xml"
                         Type="http://schemas.openxmlformats.org/officeDocument/2006/
         relationships/officeDocument"/>
         </Relationships>
      

The final file in the package is [Content_Types].xml:

         <?xml version="1.0" ?>
         <Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
           <Default ContentType="application/vnd.openxmlformats-officedocument.
         wordprocessingml.document.main+xml" Extension="xml"/>
           <Default ContentType="application/vnd.openxmlformats-package.relationships+xml"
                    Extension="rels"/>
         </Types>
      

These files should give you a feel of what’s in OOXML. To learn more, take a look at the following resources:

Viewers/Validators for OOXML

A big point of OOXML is being able to read and generate documents that are readable in the latest versions of Microsoft Office without having to directly manipulate the object models of Microsoft Office. Yet, it’s always helpful to have tools that view and validate OOXML documents—other than Microsoft Office 2007 itself. Some promising tools are as follows:

  • Open XML Package Explorer, which lets you browse and edit Open XML packages and validate against the ECMA final schemas (http://www.codeplex.com/PackageExplorer).

  • If you are using Microsoft Office XP and 2003, you can download a Microsoft Office compatibility pack for the Word, Excel, and PowerPoint 2007 file formats to read and write OOXML.[300][301]