A single-page PDF can be 50 KB or 50 MB. The difference comes down to what's inside — embedded images, fonts, metadata, and how the file was created. This guide explains exactly what determines PDF file size and the proven techniques to compress it.
What's Inside a PDF File?
A PDF is essentially a container that holds multiple types of objects. Understanding these objects is the key to understanding file size:
- Content streams — the actual page content: text operators, vector drawing commands, and references to embedded objects.
- Images — often the largest component. A single uncompressed photo can be 10–30 MB inside a PDF.
- Fonts — complete font files or subsets. Embedding a full font family can add 1–5 MB.
- Metadata — author, title, creation date, XMP metadata, document structure tags.
- Cross-reference table — an index that maps every object to its byte offset, enabling random access.
Why Are Some PDFs So Large?
Common reasons for bloated PDFs:
| Cause | Typical Impact | Solution |
|---|---|---|
| Uncompressed or high-DPI images | 10–50 MB per image | Downsample to 150 DPI for screen, re-encode as JPEG |
| Full font embedding | 1–5 MB per font | Subset fonts (embed only used glyphs) |
| Scanned pages (image-only PDF) | 2–10 MB per page | Re-render at lower DPI, apply JPEG compression |
| Duplicate objects | 1–10 MB | Object deduplication |
| Incremental saves | 2×–5× original size | Re-save (flatten incremental updates) |
| Embedded multimedia | Varies widely | Remove or link externally |
PDF Compression Techniques
There are two categories of compression: lossless (no quality reduction) and lossy (quality trade-off for smaller size).
Lossless Techniques
These reduce file size without any visual change:
- Flate/Deflate compression — the standard compression for content streams. Uses the same algorithm as ZIP files (zlib). Most modern PDFs already use this.
- Object streams — groups multiple small PDF objects into a single compressed stream, reducing overhead. Introduced in PDF 1.5.
- Cross-reference stream compression — replaces the text-based cross-reference table with a compressed binary stream.
- Font subsetting — instead of embedding a complete font with 2,000+ glyphs, embed only the 50–100 glyphs actually used in the document.
- Duplicate object removal — if the same image appears on 10 pages, store it once and reference it 10 times.
- Metadata stripping — remove unnecessary XMP metadata, thumbnails, and edit history.
Lossy Techniques
These trade some quality for significantly smaller files:
- Image downsampling — reduce image resolution from 300 DPI to 150 or 72 DPI. A 300 DPI image has 4× the pixels of a 150 DPI image.
- JPEG re-encoding — convert lossless PNG images inside the PDF to JPEG at 75–85% quality. This alone can shrink image-heavy PDFs by 60–80%.
- Color space conversion — convert CMYK images to RGB (removes the extra channel, saving ~25% per image).
- Resolution capping — enforce a maximum DPI regardless of the source image's original resolution.
Image Resolution: What DPI Do You Actually Need?
| Use Case | Recommended DPI | Reason |
|---|---|---|
| Screen viewing / email | 72–96 DPI | Monitors display at 72–96 PPI; higher resolution is invisible |
| Web download | 150 DPI | Good balance of clarity and file size; looks sharp on retina screens |
| Professional printing | 300 DPI | Industry standard for print; human eye can distinguish detail at this level |
| Large-format printing | 150–200 DPI | Viewed from a distance, so lower DPI is acceptable |
How Our Compress PDF Tool Works
The Compress PDF tool on SarvKit uses a dual-strategy approach:
- Strategy 1: Lossless re-save — re-serializes the PDF with object streams and cross-reference compression enabled. This catches the "easy wins" — bloated exports, incremental saves, uncompressed streams.
- Strategy 2: JPEG re-render — renders each page as a JPEG image and rebuilds the PDF. This is more aggressive and works well for scanned documents or image-heavy PDFs.
- Winner takes all — both strategies run, and the tool picks whichever produces the smaller file. If neither is smaller than the original, you're told the PDF is already optimized.
Everything runs in your browser using JavaScript — no files are uploaded to any server.
Best Practices for Keeping PDFs Small
- Resize images before inserting — don't embed a 4000×3000 photo when the display area is 800×600 pixels.
- Use vector graphics where possible — logos, charts, and diagrams as SVG/vector are tiny compared to raster images.
- Choose "Save As" over "Save" — "Save" appends changes incrementally, growing the file. "Save As" rewrites it cleanly.
- Subset fonts — in your PDF export settings, always enable font subsetting.
- Remove hidden content — hidden layers, annotations, comments, and form field data all add to file size.
- Avoid PDF/A unless required — PDF/A mandates full font embedding and prohibits compression shortcuts, resulting in larger files.
Common Myths About PDF Compression
- "ZIP-ing a PDF makes it smaller" — rarely. PDF content streams are already Deflate-compressed. ZIP will achieve little additional compression (typically 1–5%).
- "Removing metadata saves a lot of space" — metadata is usually a few KB. It's the images and fonts that matter.
- "Re-saving always helps" — only if the original wasn't well-optimized. A PDF exported from a modern tool with compression enabled may not benefit from re-saving.
- "All compression is lossy" — lossless techniques (object streams, deduplication, subsetting) reduce size without any quality change.
Compression Results: What to Expect
| PDF Type | Typical Reduction | Primary Technique |
|---|---|---|
| Word/PowerPoint export | 20–50% | Lossless re-save + font subsetting |
| Scanned documents | 50–80% | JPEG re-render at lower DPI |
| Image-heavy reports | 40–70% | Image downsampling + JPEG re-encoding |
| Text-only PDFs | 5–15% | Object stream compression |
| Already-optimized PDFs | 0–5% | Minimal gains possible |