Metadata is the invisible half of a PDF. Most people never look at it, which is exactly why it leaks usernames, file paths, software fingerprints, and edit history. Understanding what's stored — and how to strip it — should be part of every PDF workflow.
Two Metadata Containers
| Container | Format | Typical Fields | Notes |
|---|---|---|---|
| Info dictionary | Simple PDF object | Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate | Legacy; still read by older tools |
| XMP metadata stream | RDF/XML | Dublin Core, rights, custom namespaces | Modern standard; required for PDF/A |
| Embedded image EXIF | Per-image binary | Camera model, timestamps, GPS | Survives unless re-encoded |
| Font names | Per-font | Foundry name, subset prefix | Can reveal authoring software |
| JavaScript actions | Document or field scripts | Author code, URLs | Reveals intent and tools |
Privacy: What Leaks by Default
- Author: usually the OS account name — often a real name.
- Creator / Producer: the application versions used; fingerprints workflows.
- CreationDate / ModDate: timestamps including timezone — can place a person geographically.
- Local file paths: some tools embed the absolute source path in XMP
xmpMM:DerivedFrom. - EXIF on embedded photos: GPS coordinates, camera serial numbers, original capture times.
- Object generations: incremental updates preserve prior values; deleted text may still be recoverable.
Sanitizing Metadata Safely
- Use a "Sanitize Document" feature or scripted pipeline that rewrites and strips XMP + info dictionary.
- Re-encode embedded images if you don't trust their EXIF.
- Save As a fresh PDF — Save With incremental updates leaves history recoverable.
- Verify with
exiftoolorpdfinfoon the output before publishing. - For PDF/A targets, write a minimal sanitized metadata set instead of leaving the streams empty.
Useful Metadata
Not all metadata is dangerous. Title, language, and document subject improve search results, screen-reader experience, and library indexing. Set them deliberately rather than relying on tool defaults. Custom XMP namespaces let you embed structured data — for example, ZUGFeRD-compliant e-invoices embed an XML invoice block plus structured metadata that ERPs parse automatically.
Shrink Files Before Publishing
Compress oversized PDFs in your browser — pair with a metadata sanitize pass.
Compress PDF →