What resolution should scans be for OCR?

300 DPI is the standard minimum. Higher is better for small or dense text.

Should I compress images before OCR?

Only if the file is too large to work with. Over-compression blurs characters and reduces accuracy.

Can browser OCR handle handwriting?

Not reliably. Typed text performs well; handwriting and cursive do not.

Why does OCR language selection matter so much?

OCR engines use language-specific character models. Wrong language means wrong character assumptions and poor output.

How to Get Better OCR Results from Scanned PDFs

Why OCR quality depends on the input

OCR accuracy is primarily an input problem. A blurry, tilted, or low-contrast scan gives the OCR engine less usable information to recognize characters correctly.

The good news: most OCR failures are fixable before you run the tool.

Step 1 — Start with a readable scan

Before running OCR, check these basics:

Resolution: 300 DPI minimum. Most phone cameras exceed this, but heavily compressed photos may not.
Alignment: Keep text straight. A 5° tilt can noticeably reduce accuracy on dense text.
Contrast: Dark text on a white background. Avoid shadows from handheld scanning.
Lighting: Even, diffuse light. Avoid harsh direct light that creates hotspots.

If you cannot rescan, most phone camera apps include a document mode that auto-corrects perspective and contrast.

Step 2 — Choose the correct language

This is the single most impactful setting in any OCR tool. Running English OCR on a Hindi or Arabic document will produce near-random output.

In the Basic OCR tool, select the primary language of the document before processing. For multilingual documents, choose the dominant language.

Step 3 — Compress images carefully

If source images are very large (10+ MB per page), you can use Compress Image to reduce file size before converting to PDF. Use a high quality setting — aggressive compression blurs fine characters and reduces OCR accuracy significantly.

Only compress when file size is causing problems. When in doubt, keep the original.

Step 4 — Validate output page by page

After OCR, verify critical values:

Invoice numbers, dates, and monetary amounts
Names and addresses
The difference between O and 0, l and 1, rn and m

Re-scan individual failed pages rather than reprocessing entire documents.

Realistic limitations

Browser-based OCR using Tesseract.js works well for clear typed text. It handles these cases poorly:

Cursive or handwritten text
Dense multi-column tables
Decorative fonts and logos
Very small text (below 9pt equivalent)
Stamps or text printed over background images

For these cases, manual entry or a dedicated desktop OCR application will produce better results.

Suggested workflow on PDFHarbor

Prepare images → Compress Image if needed
Bundle pages → Image to PDF
Extract text → Basic OCR
Assemble final document → Merge PDF