Now we turn to a competing file format: Office Open XML. Wikpedia provides a good overview of the specification that underpins Microsoft Office 2007:
http://en.wikipedia.org/wiki/Office_Open_XML
The Office Open XML specification has been made into an ECMA standard (ECMA-376). You can find the specification here:
http://www.ecma-international.org/publications/standards/Ecma-376.htm
Note that the standard is 6,000 pages—in case you want to read it. ECMA provides an overview white paper:
http://www.ecma-international.org/news/TC45_current_work/OpenXML%20White%20Paper.pdf
Getting hard, easy-to-digest information on OOXML is challenging. I recommend the following, more colloquial overviews that you might find useful:
The “5 Cool Things You Must Know About the New Office 2007 File
Formats” article might prove helpful (http://www.devx.com/MicrosoftISV/Article/30907/2046
).
http://openxmldeveloper.org/default.aspx
has some useful
tutorials on the subject.
When working with Office Open XML, it’s good to heed the following warning: “Open XML is a new standard. So new, in fact, that the schemas are still being edited and haven’t been published by ECMA yet. And there are no books out on Open XML development, although that will surely change in the next year.”[298]
The Office Open XML format has a predecessor in the Microsoft Office 2003 XML format. In the book Office 2003 XML (O’Reilly Media, 2004), the following was given as a minimalist Office 2003 XML document:[299]
<?xml version="1.0"?> <?mso-application progid="Word.Document"?> <w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"> <w:body> <w:p> <w:r> <w:t>Hello, World!</w:t> </w:r> </w:p> </w:body> </w:wordDocument>
This document is actually readable by Microsoft Office 2007, though in
“compatibility mode.” Can you get a valid document by using the Microsoft
Office 2003 document and updating the namespace of the document? That is, can you just
update the namespace for w
?
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
You therefore have the following:
<?xml version="1.0"?> <?mso-application progid="Word.Document"?> <w:wordDocument xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"> <w:body> <w:p> <w:r> <w:t>Hello, World!</w:t> </w:r> </w:p> </w:body> </w:wordDocument>
Unfortunately this document as an Office Open XML instance to Microsoft Office 2007
causes an error. You can certainly keep pushing in this direction by looking through the
specification and schema. However, a more promising lead right now is to see what file gets
written out by a simple little C# script aimed at generating a simple .docx
file:
http://blogs.msdn.com/dmahugh/archive/2006/06/27/649007.aspx
I downloaded the Microsoft Visual Studio C# Express Edition to run the script and made a small change to update the namespace from this:
http://schemas.openxmlformats.org/wordprocessingml/2006/3/main
to this:
http://schemas.openxmlformats.org/wordprocessingml/2006/main
With that change, you can generate a simple Office Open XML document file (http://?examples.mashupguide.net/ch17/helloworld_simple.1.docx
) that
is acceptable by Microsoft Office 2007. (This doesn’t prove that the file is valid
but only that you are on the right track in terms of generating OOXML.)
Unzipping and studying the file gives you insight into what goes into a minimalist instance of OOXML. The list of files is as follows:
File Name Modified Size word/document.xml 2007-06-04 16:43:44 246 [Content_Types].xml 2007-06-04 16:43:44 346 _rels/.rels 2007-06-04 16:43:44 285
Let’s look at the individual files. The first is the document.xml
file in the word
directory, which holds the content of the document and
corresponds most closely to content.xml
in ODF.
<?xml version="1.0" encoding="utf-8"?> <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"> <w:body> <w:p> <w:r> <w:t>Hello World!</w:t> </w:r> </w:p> </w:body> </w:document>
The .rels
file in the rels
directory contains information
about relationships among the various files that make up the package of files (a bit like
the METAINF/meta.xml
file in ODF):
<?xml version="1.0" ?> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> <Relationship Id="rId1" Target="/word/document.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/ relationships/officeDocument"/> </Relationships>
The final file in the package is [Content_Types].xml
:
<?xml version="1.0" ?> <Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"> <Default ContentType="application/vnd.openxmlformats-officedocument. wordprocessingml.document.main+xml" Extension="xml"/> <Default ContentType="application/vnd.openxmlformats-package.relationships+xml" Extension="rels"/> </Types>
These files should give you a feel of what’s in OOXML. To learn more, take a look at the following resources:
The “Ecma Office Open XML Format Guide” is an official high-level conceptual/marketing overview of OOXML.
*
http://openxmldeveloper.org/articles/directory.aspx
lists
tutorial articles that are gathered by the OOXML community.
http://openxmldeveloper.org/articles/OpenXMLsamples.aspx
has
sample OOXML documents.
http://msdn2.microsoft.com/en-us/library/bb187361.aspx
gives the
object model of Microsoft Office 2007.
http://en.wikipedia.org/wiki/User:Flemingr/Microsoft_Office_2003_XML_formats
documents the older Office 2003 XML format, which has some family resemblance to
OOXML—though an unclear one to me.
Brian Jones of Microsoft has written some clear tutorials on generating
spreadsheets in OOXML: http://blogs.msdn.com/brian_jones/archive/2007/05/29/simple-spreadsheetml-file-part-3-formatting.aspx
.
A big point of OOXML is being able to read and generate documents that are readable in the latest versions of Microsoft Office without having to directly manipulate the object models of Microsoft Office. Yet, it’s always helpful to have tools that view and validate OOXML documents—other than Microsoft Office 2007 itself. Some promising tools are as follows:
Open XML Package Explorer, which lets you browse and edit Open XML packages and
validate against the ECMA final schemas (http://www.codeplex.com/PackageExplorer
).
If you are using Microsoft Office XP and 2003, you can download a Microsoft Office compatibility pack for the Word, Excel, and PowerPoint 2007 file formats to read and write OOXML.[300][301]