Introduction to Open XML Format SDK 2.0
Open XML is an open standard for word-processing documents, presentations, and spreadsheets that can be freely implemented by multiple applications on different platforms. OpenXML is designed to faithfully represent existing word-processing documents, presentations, and spreadsheets that are encoded in binary formats defined by Microsoft Office applications. The reason for Open XML is simple: billions of documents now exist but, unfortunately, the information in those documents is tightly coupled with the programs that created them. The purpose of the Open XML standard is to de-couple documents created by Microsoft Office applications so that they can be manipulated by other applications independent of proprietary formats and without the loss of data.
Structure of an OpenXML Package
An Open XML file is stored in a ZIP archive for packaging and compression. You can view the structure of any OpenXML file using a ZIP viewer. An Open XML document is built of multiple document parts. The relationships between the parts are themselves stored in document parts. The ZIP format supports random access to each part. For example, an application can move a slide from one presentation to another presentation without parsing the slide content. Likewise, an application can strip all of the comments out of a word processing document without parsing any of its contents.
The document parts in an Open XML package are created as XML markup. Because XML is structured plain text, you can view the contents of a document part using text readers or you can parse the contents using processes such as XPath.
Structurally, an Open XML document is an Open Packaging Conventions (OPC) package. As stated previously, a package is composed of a collection of document parts. Each part has a part name that consists of a sequence of segments or a pathname such as "/word/theme/theme1.xml." The package contains a [Content_Types].xml part that allows you to determine the content type of all document parts in the package. A set of explicit relationships for a source package or part is contained in a relationships part that ends with the .rels extension.
A WordprocessingML document is composed of a collection of stories where each story is one of the following:
*
Main document (the only required story)
*
Glossary document
*
Header and footer
*
Comments
*
Text box
*
Footnote and endnote
Presentations are described by PresentationML markup. Presentation packages can contain the following document parts:
*
Slide master
*
Notes master
*
Handout master
*
Slide layout
*
Notes
Spreadsheet workbooks are described by using SpreadsheetML markup. Workbook packages can contain:
*
Workbook part (required part)
*
One or more worksheets
*
Charts
*
Tables
*
Custom XML
The Open XML Format SDK 1.0
Version 1 of the Open XML Format SDK simplified the manipulation of OpenXML packages. The Open XML SDK Application Programming Interface (API) encapsulates many of the common tasks that you typically perform on OpenXML packages, so you can perform complex operations with just a few lines of code. Some common tasks:
*
Search. With a few lines of code, you can search a collection of Excel 2007 worksheets for some arbitrary data.
*
Document assembly. You can create documents by combining the document parts of existing documents programmatically. For example, you can pull slides from various PowerPoint 2007 presentations to create a single presentation.
*
Validation. With a few lines of code, you can validate the document parts in a package or validate an entire package against a schema.
*
Data update. With the Open XML object model, you can easily modify the data in multiple packages.
*
Privacy. With a few lines of code, you can remove comments and other personal information from a document before it is distributed.
The Open XML Format SDK 2.0
The Open XML Format SDK 2.0 extends the strongly typed class support from the part classes, which are provided in version 1.0, to the XML content in each part. All functions available in version 1.0 are still supported. Now with version 2.0, you are able to program against the XML content inside the part. The SDK supports programming in the style of LINQ to XML which makes coding against the XML content much easier than the traditional W3C XML DOM programming model.
The SDK supports the following common tasks/scenarios:
*
Strongly Typed Classes and Objects Instead of relying on generic XML functionality to manipulate XML, which requires that you be aware of element/attribute/value spelling as well as namespaces, you can use the Open XML SDK to accomplish the same solution simply by manipulating objects that represent elements/attributes/values. All schema types are represented as strongly typed Common Language Runtime (CLR) classes and all attribute values as enumerations.
*
Content Construction, Search, and Manipulation The LINQ technology is built directly into the SDK. As a result, you are able to perform functional constructs and lambda expression queries directly on objects representing Open XML elements. In addition, the SDK allows you to easily traverse and manipulate content by providing support for collections of objects, like tables and paragraphs.
*
Validation In future releases, the Open XML Format SDK 2.0 will provide validation functionality, enabling you to validate Open XML documents against different variations of the Open XML Format.
You can use the Open XML API in any language supported by the Microsoft .NET Framework®. The help topics presented in this SDK provide code samples in Microsoft Visual C#® and Microsoft Visual Basic® .NET.
Using the code samples in the help topics in this SDK as a starting point, you can take advantage of the OpenXML standards programmatically. The Open XML API relieves much of the tedium of working with Open Packaging Conventions documents and is well worth your time to explore.
Posted in: Office Development| Tags: Open XML SDK Structure Format part content xml open openxml package standard zip documentUsing the Open XML Format SDK 2.0
The Open XML Format SDK 2.0 simplifies the task of manipulating Open XML packages and the underlying Open XML schema elements within a package. The Open XML Application Programming Interface (API) encapsulates many common tasks that developers perform on Open XML packages, so you can perform complex operations with just a few lines of code.
This documentation pertains to the second Community Technical Preview (CTP) of the Open XML Format SDK 2.0, released April 2009.
Using the Open XML API
Using the Open XML API is simple. In your project or application, simply add a reference to the DocumentFormat.OpenXml.dll. A link to the download containing the assembly can be found at the Open XML Formats Resource Center.
To add a reference to the Open XML API dynamic linked library file, perform the following steps.
To add a reference in a Microsoft Visual Studio 2008 project
1. In Solution Explorer, right-click References and then click Add Reference. If the References node is not visible, click Project and then click Show All Files.
2. In the Add Reference dialog box, click .NET.
3. Scroll to the DocumentFormat.OpenXml option, highlight it, and then click OK.
4. The filename is displayed in the Solution Explorer.