Viewing Metadata in a Microsoft Word 2007 Document
It’s easy to view metadata a Microsoft Word 2007 (“.Docx”) document without Microsoft Word. A Docx file is a combination of 10 or more xml files contained in one Zip file (as strange as that sounds). The only difference between a Word 2007 file and a Zip file is the extension. To view metadata in a Microsoft Word 2007 Docx File you only need to save a Word 2007 document to your desktop and rename the extension by changing the ".docx" to a “.zip” extension. After you rename the extension, double click on it and this will open it in Windows Explorer (or your default Zip application) and will display the Word document as XML and at the root level of the XML file.
Each XML file in the Microsoft Office 2007 document is called a “part.” Within the document part is the main document part. When inspecting the main document part, you’ll notice many elements that you know are in your document, elements like styles and content. Additionally you can view metadata elements such as the document properties and author information. The chart below shows some of these metadata xml parts.
|
Microsoft Office 2007 Document Metadata XML Parts Summary
|
|||
|
Part |
Path |
Description |
Microsoft Application |
|
app.xml |
root\docProps |
Defines application file properties. These properties include the number of characters, words, lines, paragraphs, and pages in the document |
Word |
|
core.xml |
root\docProps |
Defines core file properties. Includes creator name, creation date, print date, title, and document description. |
Word |
|
custom.xml |
root\docProps |
Defines custom document properties |
Word |
|
settings.xml |
root\word |
Defines document variables |
Word |
|
document.xml |
root\word |
Defines tracked changes and author information. |
Word |
|
recipientData.xml |
root\word |
Contains contact data mail merge operation |
Word |
|
revisionHeaders.xml |
root\xl\revisions |
Defines tracked changes, author and date information. |
Excel |
|
comments1.xml |
root\xl |
Defines comments in the workbook, comment author and comment dates and times |
Excel |
|
notesSlide1.xml |
root\ppt\notesSlides |
Defines slide note information |
PowerPoint |
|
commentAuthors.xml |
root\ppt |
Defines information about each author who has added a comment to the document. That information includes the author’s name, initials, a unique author-ID, a last-comment-index-used count, and the author display color. |
PowerPoint |