The most popular and comprehensive Open Source ECM platform
By Dick Weisinger
Metadata is often defined simply as data that describes data. In Content Management applications, metadata refers to structured data that gets attached to unstructured documents to provide methods for consistent organization and for searching the unstructured data.
Metadata is often embedded as part of unstructured file content — Microsoft Office documents is one well-known example. Embedded metadata is useful when automatically transmitting documents between systems. Metadata fields like ‘author’, ‘company’, and ‘creation date’ can be saved as part of a file and then easily extracted and reused if the file is later transmitted to a new system.
Embedded file metadata is typically hidden and can usually only be found by users after drilling into some infrequently used application dialogs. It seems rather inocuous, but embedded metadata has become the spotlight of legal battles. Scanning through the web site law.com, you’ll come across entries like “Five Hazards to Avoid While Navigating Metadata“, “Defining Metadata Ethics“, and “To MetaData or Not to MetaData“. Metadata is described as ‘dangerous’.
What are some of the legal issues or problems?
- Metadata can contain sensitive information like who drafted a document and when it was drafted.
- Document change tracking information retained in the file can include revisions and notes that may unintendedly reveal information.
- Users are often not conscious of the fact that files sent as email attachments or posted to web sites include metadata might contain data beyond the main body of the structured document.
As part of an eDiscovery process electronic files need to be produced and the two parties of the litigation need to come to an agreement on the format of the data. The Federal Rules of Civil Procedures Rule 34(b)(ii) addresses eDiscovery and how electronic data should be produced and made available for investigation as part of litigation. There is a battle going on over the preservation of metadata for use in litigation.
Metadata lost via file conversions or by ‘scrubbing’ is at the heart of the controversy. Sometimes eFiles are converted to neutral files, like PDF, to allow a standard method for search, but any kind of file conversion can cause information to be lost. PDF files, for example, don’t allow for as rich a metadata set as Microsoft Office files do. Trying to convert Word documents to something like TIFF would be viewed as unacceptable because an image format would degrade the searchability of the document.
Scrubbing can remove all or some of the metadata tags and tracked information in the file. It’s OK to remove privileged or confidential information — but who decides what fits that criteria and what method is used to remove or redact the information are difficult questions that the courts are still in the process of trying to sort out.