PDF Metadata Guide: XMP, Properties & Privacy

Metadata is the invisible half of a PDF. Most people never look at it, which is exactly why it leaks usernames, file paths, software fingerprints, and edit history. Understanding what's stored — and how to strip it — should be part of every PDF workflow.

Two Metadata Containers

ContainerFormatTypical FieldsNotes
Info dictionarySimple PDF objectTitle, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDateLegacy; still read by older tools
XMP metadata streamRDF/XMLDublin Core, rights, custom namespacesModern standard; required for PDF/A
Embedded image EXIFPer-image binaryCamera model, timestamps, GPSSurvives unless re-encoded
Font namesPer-fontFoundry name, subset prefixCan reveal authoring software
JavaScript actionsDocument or field scriptsAuthor code, URLsReveals intent and tools

Privacy: What Leaks by Default

  • Author: usually the OS account name — often a real name.
  • Creator / Producer: the application versions used; fingerprints workflows.
  • CreationDate / ModDate: timestamps including timezone — can place a person geographically.
  • Local file paths: some tools embed the absolute source path in XMP xmpMM:DerivedFrom.
  • EXIF on embedded photos: GPS coordinates, camera serial numbers, original capture times.
  • Object generations: incremental updates preserve prior values; deleted text may still be recoverable.

Sanitizing Metadata Safely

  1. Use a "Sanitize Document" feature or scripted pipeline that rewrites and strips XMP + info dictionary.
  2. Re-encode embedded images if you don't trust their EXIF.
  3. Save As a fresh PDF — Save With incremental updates leaves history recoverable.
  4. Verify with exiftool or pdfinfo on the output before publishing.
  5. For PDF/A targets, write a minimal sanitized metadata set instead of leaving the streams empty.

Useful Metadata

Not all metadata is dangerous. Title, language, and document subject improve search results, screen-reader experience, and library indexing. Set them deliberately rather than relying on tool defaults. Custom XMP namespaces let you embed structured data — for example, ZUGFeRD-compliant e-invoices embed an XML invoice block plus structured metadata that ERPs parse automatically.

Shrink Files Before Publishing

Compress oversized PDFs in your browser — pair with a metadata sanitize pass.

Compress PDF →

Frequently Asked Questions

Info dictionary, XMP RDF/XML stream, embedded image EXIF, font names, and scripts.
Often. Usernames, file paths, software versions, and image GPS routinely leak.
Document Properties panel, exiftool, pdfinfo, or qpdf.
No, but PDF/A requires metadata — rewrite sanitized values instead of deleting.
Structured data: e-invoices, photo rights, archival IDs, scientific dataset references.