How to Make a Scanned PDF Searchable with OCR
OCR turns a stack of page images into a searchable document. Here's how to tell when you need it, what makes accuracy good or bad, and how to run it without surprises.
Prerequisites
- A scanned PDF or image‑based PDF (text not selectable)
- Omnvert PDF OCR
Step-by-step
- 1
Tell whether your PDF actually needs OCR
Try to select text in the PDF. If you can't select or copy a single letter, it's image‑based and needs OCR. If text is already selectable, OCR won't help — you already have a text PDF. Other signs: Ctrl+F returns no results for words clearly visible on the page, the file size is suspiciously large for a short document, and it came from a scanner, a fax, or a phone photo of paper.
- 2
Understand what OCR actually does
Optical Character Recognition analyzes pixel patterns in each page image and maps them to character codes. The output is a PDF where the original page images stay visible but an invisible text layer has been added underneath. The page looks identical to the scan — but text is now selectable, copyable, and searchable. The image isn't replaced; the recognized characters are layered over it at matching positions.
- 3
Know the factors that hurt accuracy
Scan resolution matters most — aim for 300 dpi minimum, 200 is borderline. Skewed pages, low contrast, coffee stains, and background textures all drop accuracy. Standard serif and sans‑serif fonts OCR well; decorative or script fonts significantly worse. Language setting matters too: using an English model on a Turkish document tanks accuracy even for Latin letters with diacritics. Handwriting is a separate problem — general OCR doesn't do it reliably.
- 4
Run OCR with Omnvert
Upload your scanned PDF to the PDF OCR, pick the primary language of the document, and let it process. A clean 300 dpi scan runs at a second or two per page; heavy multi‑column magazine scans are slower. Download the resulting searchable PDF and test: open it, Ctrl+F a word you can see, and try selecting text.
- 5
Fix garbled output
Common causes and fixes: the page is rotated (use PDF Rotate first to correct orientation, then re‑OCR); the scan resolution is too low (re‑scan at 300 dpi if you can); the source is a phone photo of paper, not a proper scan (use image filters to boost contrast and normalize lighting before converting to PDF); multi‑column layouts confuse reading order (expect a quick manual fix after copy).
- 6
What you can do after OCR
Once the PDF is searchable, open up new workflows: use PDF Redact to permanently black out sensitive text (names, case numbers, account numbers), PDF → Word to pull the content into an editable document, or Split PDF to extract just the pages that matter — all of which only make sense on a text PDF.
OCR works on the image as‑is. If your scan is skewed, low‑contrast, or under 200 dpi, fix those first — OCR can't compensate for a bad original scan.