PDF Redaction Guide: True Removal vs Black Boxes

Redaction failures appear in the news every year. A government PDF, a corporate filing, a court exhibit — all leaked because someone drew a rectangle instead of removing content. Real redaction is irreversible by design, and it requires more than just covering pixels.

Black Box vs True Redaction

ApproachWhat it doesReversible?Verdict
Drawn rectangle / annotationOverlays a shape on the pageYes — delete the annotationUnsafe
Highlighted with blackAdds an annotation onlyYes — copy text from underneathUnsafe
Hidden layer (OCG)Marks content non-visibleYes — toggle layer visibilityUnsafe
True redactionRemoves content from the streamNoSafe (combine with sanitize)
Rasterize pageReplaces page with a flat imageNo, but text becomes unsearchableHeavy-handed but safe

A Safe Redaction Workflow

  1. OCR first. Run OCR on scanned pages so the redaction tool can see and target the hidden text layer.
  2. Mark redactions. Use a redaction-aware tool to select text, regions, or by pattern (SSN, email, phone).
  3. Apply redactions. This step physically removes the content. Once applied, it cannot be undone.
  4. Sanitize the document. Strip metadata, embedded scripts, hidden layers, comments, form data, and obsolete object versions.
  5. Save As (not Save). Save As writes a fresh PDF; Save can leave previous object generations recoverable.
  6. Verify. Open the saved file, search for known redacted strings, inspect properties, and check that nothing comes back.

Metadata: The Quiet Leak

Even after every visible word is gone, metadata often still leaks. The XMP stream may contain the original title, author, or even a description that summarizes the redacted text. Bookmarks and document outlines may reference the old text. Embedded fonts can include subsetted glyphs spelling out the redacted name. Form fields keep default values. Comments preserve discussion threads. Every serious sanitize pass scrubs these.

When to Rasterize Instead

If the source PDF is messy, has many layers, or you simply don't trust a partial removal, rasterize the page. The page becomes a flat image — text, fonts, and vector paths are replaced with pixels. Sensitive content cannot be selected or recovered. The trade-off is loss of searchability and accessibility. Rasterize as a last-resort safeguard for high-stakes documents.

Mark Documents Before Sharing

Apply a "CONFIDENTIAL" watermark on top of sensitive PDFs entirely in-browser.

Watermark PDF →

Frequently Asked Questions

No. The text underneath remains selectable and copyable. Use true redaction.
Removes content from the content stream irreversibly, then sanitizes metadata.
Often. XMP, comments, bookmarks, and embedded scripts must all be stripped.
Not when done properly. Save As fresh PDF after applying and sanitizing.
OCR text, alt text, link targets, comments, form defaults, embedded font glyphs.