Redaction failures appear in the news every year. A government PDF, a corporate filing, a court exhibit — all leaked because someone drew a rectangle instead of removing content. Real redaction is irreversible by design, and it requires more than just covering pixels.
Black Box vs True Redaction
| Approach | What it does | Reversible? | Verdict |
|---|---|---|---|
| Drawn rectangle / annotation | Overlays a shape on the page | Yes — delete the annotation | Unsafe |
| Highlighted with black | Adds an annotation only | Yes — copy text from underneath | Unsafe |
| Hidden layer (OCG) | Marks content non-visible | Yes — toggle layer visibility | Unsafe |
| True redaction | Removes content from the stream | No | Safe (combine with sanitize) |
| Rasterize page | Replaces page with a flat image | No, but text becomes unsearchable | Heavy-handed but safe |
A Safe Redaction Workflow
- OCR first. Run OCR on scanned pages so the redaction tool can see and target the hidden text layer.
- Mark redactions. Use a redaction-aware tool to select text, regions, or by pattern (SSN, email, phone).
- Apply redactions. This step physically removes the content. Once applied, it cannot be undone.
- Sanitize the document. Strip metadata, embedded scripts, hidden layers, comments, form data, and obsolete object versions.
- Save As (not Save). Save As writes a fresh PDF; Save can leave previous object generations recoverable.
- Verify. Open the saved file, search for known redacted strings, inspect properties, and check that nothing comes back.
Metadata: The Quiet Leak
Even after every visible word is gone, metadata often still leaks. The XMP stream may contain the original title, author, or even a description that summarizes the redacted text. Bookmarks and document outlines may reference the old text. Embedded fonts can include subsetted glyphs spelling out the redacted name. Form fields keep default values. Comments preserve discussion threads. Every serious sanitize pass scrubs these.
When to Rasterize Instead
If the source PDF is messy, has many layers, or you simply don't trust a partial removal, rasterize the page. The page becomes a flat image — text, fonts, and vector paths are replaced with pixels. Sensitive content cannot be selected or recovered. The trade-off is loss of searchability and accessibility. Rasterize as a last-resort safeguard for high-stakes documents.
Mark Documents Before Sharing
Apply a "CONFIDENTIAL" watermark on top of sensitive PDFs entirely in-browser.
Watermark PDF →