PDF Compression Explained: How It Works and How to Shrink PDFs Without Losing Quality

A single-page PDF can be 50 KB or 50 MB. The difference comes down to what's inside — embedded images, fonts, metadata, and how the file was created. This guide explains exactly what determines PDF file size and the proven techniques to compress it.

What's Inside a PDF File?

A PDF is essentially a container that holds multiple types of objects. Understanding these objects is the key to understanding file size:

  • Content streams — the actual page content: text operators, vector drawing commands, and references to embedded objects.
  • Images — often the largest component. A single uncompressed photo can be 10–30 MB inside a PDF.
  • Fonts — complete font files or subsets. Embedding a full font family can add 1–5 MB.
  • Metadata — author, title, creation date, XMP metadata, document structure tags.
  • Cross-reference table — an index that maps every object to its byte offset, enabling random access.

Why Are Some PDFs So Large?

Common reasons for bloated PDFs:

CauseTypical ImpactSolution
Uncompressed or high-DPI images10–50 MB per imageDownsample to 150 DPI for screen, re-encode as JPEG
Full font embedding1–5 MB per fontSubset fonts (embed only used glyphs)
Scanned pages (image-only PDF)2–10 MB per pageRe-render at lower DPI, apply JPEG compression
Duplicate objects1–10 MBObject deduplication
Incremental saves2×–5× original sizeRe-save (flatten incremental updates)
Embedded multimediaVaries widelyRemove or link externally

PDF Compression Techniques

There are two categories of compression: lossless (no quality reduction) and lossy (quality trade-off for smaller size).

Lossless Techniques

These reduce file size without any visual change:

  • Flate/Deflate compression — the standard compression for content streams. Uses the same algorithm as ZIP files (zlib). Most modern PDFs already use this.
  • Object streams — groups multiple small PDF objects into a single compressed stream, reducing overhead. Introduced in PDF 1.5.
  • Cross-reference stream compression — replaces the text-based cross-reference table with a compressed binary stream.
  • Font subsetting — instead of embedding a complete font with 2,000+ glyphs, embed only the 50–100 glyphs actually used in the document.
  • Duplicate object removal — if the same image appears on 10 pages, store it once and reference it 10 times.
  • Metadata stripping — remove unnecessary XMP metadata, thumbnails, and edit history.
💡 Tip: Lossless re-saving alone can reduce file size by 10–40% for PDFs exported from word processors, because many editors don't enable object streams or cross-reference compression by default.

Lossy Techniques

These trade some quality for significantly smaller files:

  • Image downsampling — reduce image resolution from 300 DPI to 150 or 72 DPI. A 300 DPI image has 4× the pixels of a 150 DPI image.
  • JPEG re-encoding — convert lossless PNG images inside the PDF to JPEG at 75–85% quality. This alone can shrink image-heavy PDFs by 60–80%.
  • Color space conversion — convert CMYK images to RGB (removes the extra channel, saving ~25% per image).
  • Resolution capping — enforce a maximum DPI regardless of the source image's original resolution.

Image Resolution: What DPI Do You Actually Need?

Use CaseRecommended DPIReason
Screen viewing / email72–96 DPIMonitors display at 72–96 PPI; higher resolution is invisible
Web download150 DPIGood balance of clarity and file size; looks sharp on retina screens
Professional printing300 DPIIndustry standard for print; human eye can distinguish detail at this level
Large-format printing150–200 DPIViewed from a distance, so lower DPI is acceptable
💡 Math: Reducing a full-page image from 300 DPI to 150 DPI cuts pixel count by 75% (from ~8.7 million to ~2.2 million pixels on an A4 page). Combined with JPEG compression, this can turn a 30 MB PDF into a 2 MB one.

How Our Compress PDF Tool Works

The Compress PDF tool on SarvKit uses a dual-strategy approach:

  1. Strategy 1: Lossless re-save — re-serializes the PDF with object streams and cross-reference compression enabled. This catches the "easy wins" — bloated exports, incremental saves, uncompressed streams.
  2. Strategy 2: JPEG re-render — renders each page as a JPEG image and rebuilds the PDF. This is more aggressive and works well for scanned documents or image-heavy PDFs.
  3. Winner takes all — both strategies run, and the tool picks whichever produces the smaller file. If neither is smaller than the original, you're told the PDF is already optimized.

Everything runs in your browser using JavaScript — no files are uploaded to any server.

Best Practices for Keeping PDFs Small

  • Resize images before inserting — don't embed a 4000×3000 photo when the display area is 800×600 pixels.
  • Use vector graphics where possible — logos, charts, and diagrams as SVG/vector are tiny compared to raster images.
  • Choose "Save As" over "Save" — "Save" appends changes incrementally, growing the file. "Save As" rewrites it cleanly.
  • Subset fonts — in your PDF export settings, always enable font subsetting.
  • Remove hidden content — hidden layers, annotations, comments, and form field data all add to file size.
  • Avoid PDF/A unless required — PDF/A mandates full font embedding and prohibits compression shortcuts, resulting in larger files.

Common Myths About PDF Compression

  • "ZIP-ing a PDF makes it smaller" — rarely. PDF content streams are already Deflate-compressed. ZIP will achieve little additional compression (typically 1–5%).
  • "Removing metadata saves a lot of space" — metadata is usually a few KB. It's the images and fonts that matter.
  • "Re-saving always helps" — only if the original wasn't well-optimized. A PDF exported from a modern tool with compression enabled may not benefit from re-saving.
  • "All compression is lossy" — lossless techniques (object streams, deduplication, subsetting) reduce size without any quality change.

Compression Results: What to Expect

PDF TypeTypical ReductionPrimary Technique
Word/PowerPoint export20–50%Lossless re-save + font subsetting
Scanned documents50–80%JPEG re-render at lower DPI
Image-heavy reports40–70%Image downsampling + JPEG re-encoding
Text-only PDFs5–15%Object stream compression
Already-optimized PDFs0–5%Minimal gains possible

Frequently Asked Questions

Not necessarily. Lossless techniques like font subsetting, object stream compression, and resource deduplication reduce size without any quality loss. Only lossy techniques like image downsampling or JPEG re-encoding affect visual quality, and even then the difference is often imperceptible at reasonable settings.
It depends on the content. Scanned documents can shrink 50–80% through image re-rendering. Word or PowerPoint exports typically reduce 20–50%. Text-only PDFs see modest gains of 5–15%. Already-optimized PDFs may only shrink 0–5%.
Rarely. PDF content streams are already Deflate-compressed (the same algorithm ZIP uses). Zipping a PDF typically achieves only 1–5% additional compression. Proper PDF-level optimization is far more effective.
Font subsetting means embedding only the characters (glyphs) actually used in the document rather than the entire font file. A full font might be 200–500 KB, while a subset containing just the characters used could be 15–40 KB.
Yes, when using browser-based tools like PDFTools. Your files never leave your device — all compression happens locally in your browser. No data is uploaded to any server.

Ready to compress?

Open Compress PDF Tool →