Most of us have little difficulty in understanding where a file’s data is: each file’s records identify where that’s stored. But where is the information about that data, its metadata, stored? The best answer is that it all depends, and this article explains what it depends on.
When the Mac was being developed in the early 1980s, the majority of file systems were crude and simple: each file consisted of the information about that file in attributes such as its name and date of creation, and the data stored for that file. If you wanted to supplement the file’s basic attributes with metadata, then they had to be stored within the file’s data, which isn’t good practice, but worked.
From its outset, the Mac was intended to be different. Instead of its files having just a data fork, they also had a structured database of resources, the resource fork. That was particularly useful for apps, as they could keep different localisations of their menus in the resource fork, for example. Resource forks were allowed for any file, so an app could store a document’s window settings in that document’s resource fork, and many did.
When Mac OS X was being developed, there were some who wanted to remove those resource forks, and others who fought to retain them. In the end the ideal compromise was reached, whereby files could have as many forks as they wanted, as extended attributes. In Mac OS X 10.4, those traditional resource forks became extended attributes of type com.apple.ResourceFork
.
Against all odds, extended attributes, xattrs in short, began to flourish. In the Mac Extended File System HFS+ they were stored separately from file data, as they are in APFS today. This makes them ideal for storing metadata: rather than having to edit and store metadata embedded within the file’s data, it could be kept apart as xattrs. The snag here, though, was that other file systems weren’t as flexible as HFS+ (or APFS later). Windows file systems in particular knew nothing of forks or xattrs.
As the major apps added more metadata to their standard file types, they had to cater for the limitations of Windows, and all found ways of storing metadata in their file data. Two good examples are images and PDFs.
Several different systems of image metadata have been defined, of which the best known is EXIF, first published in 1995. Although they’re stored in designated segments of the file data, those metadata can be spread almost anywhere within the file. As a result, EXIF metadata have commonly been corrupted or even removed by apps when they save image files, but when handled properly, they are independent of platform and file system.
PDF is another common file format with ancient origins, and was first released in 1993. Every PDF file consists of a series of objects, most of which are normally stored as ASCII text. Towards the end of every file is an index table to all the objects within it. Objects include both the content of the file, such as images and drawn text, and its metadata. The latter includes a basic set of information such as the document title and authors, and more extensive streams using XMP. Comments and markup are normally added as metadata objects, again stored within the file data.
So for most metadata that has to work across platforms, it has to conform to the lowest common denominator, a file consisting of its attributes and the data alone. For those intended for use on Macs, though, and a few other more enlightened file systems, metadata can be stored where they should be, in xattrs. This is standard for Finder tags, quarantine information for downloads, and can be used for many other purposes such as assigning keywords, and attaching hashes to verify the integrity of files.
The Finder and Spotlight integrate metadata embedded within file data with those stored in xattrs. Although this is very helpful to users, for example when searching by keyword, it confuses where those metadata are stored. That’s important for those of us who want to preserve important files without changing their data. For example, I have a great many PDF files covering all sorts of topics, but to add keywords to their standard metadata would result in all their data being changed, something I’m not prepared to do.
This isn’t helped by the opaqueness of xattrs either. There are remarkably few GUI apps that provide any access to xattrs, either in terms of specialist or general editors, and configuring the display of metadata in the Finder isn’t simple. Apple has long considered that xattrs are the preserve of its own engineers, developers and a few advanced users, and provided little support or documentation. It hasn’t provided anything equivalent to its famous resource fork editor, ResEdit.
So if you want to discover where any file’s metadata are, you’ll need to arm yourself with a xattr editor like my free xattred, an app specialising in the editing of that type of file, and sometimes a format-specific metadata editor too. No wonder that metadata are woefully underused and frequently misunderstood.