All guides

Preparing Scanned Documents for OCR: A Practical Checklist

A fast reference checklist for getting clean OCR output — what to fix before you run the tool, and what to check after.

May 11, 2026 | 4 min read

Before you run OCR

Use this checklist to maximize accuracy before opening the Basic OCR tool.

Scan quality

  • [ ] Resolution is at least 300 DPI
  • [ ] Text is sharp — not blurry when zoomed to 100%
  • [ ] No significant shadows across the text area
  • [ ] Pages are straight — not noticeably tilted

Document condition

  • [ ] No torn or folded corners covering text
  • [ ] No coffee stains or marks obscuring words
  • [ ] Handwritten annotations are separated from typed text where possible

Image preparation

  • [ ] Very large images (5+ MB) are compressed with Compress Image at quality 85%+
  • [ ] Images are compiled into a PDF using Image to PDF

Language settings

  • [ ] The correct document language is selected in the OCR tool
  • [ ] For mixed-language documents, the dominant language is chosen

After OCR runs

Verify critical data

  • [ ] Invoice numbers, totals, dates match originals
  • [ ] Names and addresses look correct
  • [ ] No obvious character swaps (O/0, l/1, rn/m)

Check structure

  • [ ] Multi-column documents: verify reading order is logical
  • [ ] Tables: confirm row/column alignment in output
  • [ ] Headers and footers: verify they did not duplicate into body text

Handle failures

  • [ ] Identify pages with poor output
  • [ ] Rescan or manually correct those pages
  • [ ] Do not re-run OCR on the full document for a few bad pages

What OCR cannot reliably handle

Even with perfect preparation, expect limitations with:

  • Handwritten text
  • Text printed over background images or watermarks
  • Very small type (below 8–9pt equivalent)
  • Decorative or script fonts
  • Complex tables with merged cells

For these cases, manual transcription is more reliable than attempting to improve OCR output further.

Common questions

What is the minimum resolution for reliable OCR?

300 DPI. Below that, character recognition accuracy drops noticeably. Most modern phone cameras are well above this.

Can I run OCR on a PDF that already has text?

Yes, but it is usually unnecessary. If the PDF already has selectable text, you can copy it directly without running OCR.

Why does OCR fail on some pages but not others in the same document?

Individual page quality varies. Shadows, folds, or lower contrast on specific pages cause localized failures. Re-scan only affected pages.

Is there a way to improve OCR output after running it?

Not automatically. Manual correction of the extracted text file is the most reliable path for critical accuracy.

Try the tools