How to Extract Text from an Image Only PDF
If you cannot select text in a PDF, it is probably an image based file. Here is how to extract the text so you can copy, search, or edit it.
May 15, 2026 | 6 min read
How to tell if your PDF is image only
Open the PDF and try to select a sentence with your mouse cursor. If your cursor highlights the text like normal selectable text, your PDF contains real text. PDF to Word or copy paste will work directly.
If you cannot select any text, or the entire page selects as one block (like an image), your PDF is image only. The pages are pictures, not text. You need OCR to extract the words.
Why image only PDFs exist
A few common reasons:
- The PDF was created by scanning paper documents
- It came from photos of pages taken on a phone
- It was generated by older software that saved pages as images
- The original creator chose to export as image to prevent text copying
Whatever the source, the fix is the same. Run the file through OCR to extract the text into a real text format.
The extraction workflow
Use the Basic OCR tool. It examines each page, recognizes the printed characters, and outputs the extracted text in a usable format.
Step by step:
- Open the Basic OCR tool.
- Drop your PDF into the upload area.
- Select the language of the document. This is critical. The OCR engine uses language specific character models. Wrong language equals wrong output.
- Click the OCR button and wait for processing. Expect 5 to 30 seconds per page on a typical laptop.
- Choose your output format:
- .txt for plain text only
- .docx for a Word document with the extracted text
- Download the result.
The text from your image only PDF is now extracted into a format where you can edit, search, copy, and reuse it.
Picking the right output format
For most uses, .docx (Word) is the better choice. You get:
- The text in editable paragraphs
- Basic formatting where the OCR detected it (headings, lists)
- A document you can open in Word, Google Docs, or any word processor
For pure data extraction (just the words, no formatting), .txt is fine and produces the smallest file. You can paste the text into anything.
How accurate is OCR
Accuracy depends almost entirely on input quality:
- Clear printed text on a clean scan: 95 to 99 percent
- Average phone photo of a typed document: 85 to 95 percent
- Faded photocopies or poor scans: 70 to 85 percent
- Handwritten text: typically 30 to 60 percent (often not worth using)
- Decorative or unusual fonts: variable, sometimes very poor
For critical accuracy (legal text, financial data), always read through the output and verify against the original. OCR makes specific predictable mistakes:
- The letter O and the number 0
- The letter l (lowercase L) and the number 1
- "rn" appearing as "m"
- Punctuation getting dropped or duplicated
A two minute review catches almost all these errors.
Improving accuracy before running OCR
If your source is low quality, fixing it before OCR is more effective than trying to fix OCR output after the fact. Tips:
- Use the document mode on your phone for new scans
- Make sure pages are straight, well lit, and free of shadows
- Crop tightly to the document edges
- If images are very large (5+ MB each), compress at 85 percent or above using Compress Image before running OCR
- Avoid going below 85 percent quality on documents that need OCR; lower quality blurs the characters
A few minutes spent improving source images pays off many times over in OCR accuracy.
What to do after extracting
Once you have the text in editable form, common next steps:
- Edit and reformat in Word
- Translate to another language (paste into Google Translate or DeepL)
- Search across multiple documents (combine extracted text from several PDFs)
- Re-export to PDF using Word to PDF if you want a searchable text PDF for archiving
For archiving, a text PDF (built from your Word file) is much more useful than the original image only PDF because it is searchable forever after.
When OCR is not the right answer
A few situations where you do not want to use OCR:
- The PDF already has selectable text (use copy paste or PDF to Word instead, much faster)
- You only need to share the image, not edit the text (just share the PDF as is)
- The content is mostly handwriting (OCR accuracy will be poor; manual transcription is often faster)
- You need formatted output that exactly matches the original layout (OCR cannot preserve complex layouts perfectly)
For everything else, especially scans, phone photos, and old archived documents where the text matters, OCR is the standard tool.
Common questions
Why can I not just copy paste text from my PDF?
Because the PDF is image only. The text you see is a picture, not selectable text. You need OCR to convert the picture of text into actual text.
Will OCR perfectly recreate my original document layout?
No. OCR extracts the text accurately but does not preserve complex layouts perfectly. Tables, multi column layouts, and intricate formatting may need manual cleanup.
Can OCR handle a PDF that mixes text pages and scanned pages?
Yes. The tool runs OCR on every page regardless of source type. For pages that already have text, running OCR is unnecessary but not harmful.
Is my file uploaded to a server during OCR?
No. The Basic OCR tool runs entirely in your browser. Your PDF is processed locally and never sent to a server.