Access and Feeds

Embedded Document MetaData

By Dick Weisinger

There was a blog here a few months back about PDF/A, a version of Adobe’s PDF file format that is intended to be used for long-term document archival.  AIIM recently updated their FAQ about PDF/A, and it provides good information.

One interesting capability included as part of PDF/A (ISO 19005-1) is that it includes Adobe XMP Metadata.  Adobe announced XMP in September, 2001 and currently Adobe uses the technology in all component products of their Adobe Create Suite.  The concept of XMP is similar to how Microsoft Office products automatically capture and allow the assignment of metadata to files being authored.  The basis for XMP is in the W3C’s Resource Description Framework (RDF). 

Metadata embedded directly into files allow files to be ‘self-describing’, and embedded metadata can help you easily file, locate, identify and distinguish between similarly named files.

Automatic embedding of metadata into files by capture devices like scanners and digital cameras is fairly standard.  Usually information like date/time, height, width and file format are saved.  Cameras typically embed data according to the Exchangable Image File Format (EXIF), a format modeled after the tag structure of TIFF files.  (The TIFF format was created by Aldus which was later acquired by Adobe Systems).  EXIF tag data includes camera setting information like F-Stop, focal length and shutter speed.  Some newer cameras are embedding GPS location information too.

Beyond the automatically captured EXIF metadata, Adobe XMP also allows users to embed additional information like source, headline and instructions.

The upside of being able to embed metadata in files is that it allows information like keywords, version information, captions, format information, creation and modification dates, and other file information to be tagged.  For example, graphic files used on a web site might contain embedded copyright information and originating URL. 

And when used in conjunction with a workflow, the state of the document can be kept as part of the metadata that automatically flows alongs with the file, across networks and via email.  This is interesting because a single file could be treated as a self-contained packet of workflow information where embedded metadata describes the state of the document within the workflow.  Participation in a workflow could be then reduced to the capability of simply being able to read and update embedded file metadata.

There are a lot of positives for using embedded metadata as a tool to assit workflow.  But while embedded file metadata can enable more flexible workflow, letting files and documents flow into and out of the control of a central workflow system can make it more difficult to track the location and state of a task and also to validate metadata changes.  The repository used with the workflow system would need to continually synchronize its metadata with the embedded file metadata.  But the biggest problem is still the lack of adoption of XMP or other technologies for embedding file metadata by software application vendors, unless you’ve standardized on dealing only with PDF or other Adobe file formats.

Using embedded file metadata also brings along with it additional security issues.  Files containing embedded metadata when exported and destined for public consumption may need to be stripped of personal or confidential information stored as metadata.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply

Your email address will not be published. Required fields are marked *

*