Image to Text

Image to Text
Online

Drop a screenshot, a phone photo, a scan, or a single-page PDF and copy the text in seconds. 25 languages — Latin, Cyrillic, Arabic, Greek, Chinese, Japanese, Korean, Thai. Each non-English pick auto-includes English so brand names and technical terms in mixed documents come out clean. The recognition engine runs inside your browser — your file doesn't go to our server.

Drop your image here

JPG, PNG, WebP, HEIC, or single-page PDF. The recognition engine loads into your browser the first time, then runs locally — nothing is uploaded.

Verify it yourself: open DevTools, switch to the Network tab, then drop a file — you'll see zero outbound requests carrying your image.

25 languages, 95%+ accuracy on clean scans Works offline after first load

Free

No sign-up

File stays on device

25 languages

Output

Plain text

Reading text…

invoice-scan.jpg Done

File stays on your device

Every tool

0 BYTES TO SERVER

HOW IT WORKS

Three steps.

Pick the language and drop your image

JPG, PNG, WebP, HEIC from iPhone, or a single-page PDF. The tool starts in the language your browser uses (English, Russian, Ukrainian, German, Japanese, and 20 more), and you can change it any time from the top bar. The file opens in the browser and stays there.

Wait a few seconds while the engine reads

Tesseract.js — an open-source WebAssembly port of the Tesseract OCR engine that Google maintains — runs entirely in your browser. The first time you use a language, its model (1–6 MB depending on the script) downloads from our server and caches. After that, the page works offline. A clean A4 scan finishes in 3–8 seconds on a modern laptop; a phone photo of a sign or receipt is usually under 4 seconds.

Copy the text or download .txt

The result lands in a plain-text box with a confidence score. Copy it to the clipboard, or download it as a UTF-8 .txt file. Everything that ran — the WASM core, the language model, the recognition itself — happened on your device. No file, no recognized text, and no metadata reached our server.

What OCR actually does

Optical Character Recognition (OCR) turns pixels that look like letters into machine-readable text. A photo of a receipt or a scanned page of a book is just colored dots to a computer until a model trained on millions of typed characters maps each dot pattern back to a, b, 9, =. The engine in this tool is Tesseract, the same one used by ABBYY's free tier, the Internet Archive's book scanner, and most open-source PDF readers, compiled to WebAssembly so it runs in your browser instead of on someone else's server.

What you can drop in

JPG, PNG, WebP, HEIC (iPhone), and single-page PDF. HEIC is decoded inside the browser — no separate conversion step. A multi-page PDF will OCR its first page; if you need every page, split the PDF first with split-pdf and run each through. Photos straight from a phone work; so do screenshots, flatbed scans, and screen-captured documents. The practical upper bound is 25 MB per file — past that the browser will struggle to keep the canvas and the WASM heap in memory at the same time.

What good input looks like

OCR quality is dominated by the image, not the engine. A 300 DPI flatbed scan of a typed page hits 98–99% accuracy with no tuning. A sharp phone photo of a receipt under decent light: 95%+. A blurry photo of a contract taken at an angle in dim light: 60–80%, and the result will need proof-reading. Three things move accuracy the most: sharpness (focus the camera before pressing the shutter), contrast (a white-on-black coffee receipt is harder than black-on-white), and orientation (the engine handles small skews but a sideways image needs to be rotated first). If you're scanning IDs or contracts and accuracy matters, use a scanner app like Adobe Scan, Notes (iOS), or Google Drive on the phone first — they correct perspective and contrast before saving.

Languages

25 languages ship by default — one per Vastiko UI locale, plus simplified Chinese for mainland users. Latin: English, German, French, Spanish, Italian, Portuguese (works for Brazilian too), Polish, Dutch, Danish, Swedish, Romanian, Hungarian, Czech, Turkish, Indonesian, Vietnamese. Cyrillic: Russian, Ukrainian. Greek. Arabic (right-to-left). Thai. CJK: Japanese, Korean, Chinese Simplified (mainland), Chinese Traditional (Taiwan / Hong Kong).

The tool guesses your language from the browser locale on first load — a Ukrainian browser opens with Ukrainian pre-selected, a Japanese browser with Japanese, and so on. Change it any time from the top bar; your last choice persists across sessions. Each language pack downloads on first use (1–6 MB depending on the script — Latin scripts compress better than CJK) and caches in your browser. Switching languages is a one-time download per language; subsequent uses are instant.

Most real documents in a non-English-speaking country mix the local script with English — a Russian salary slip name-drops Excel and PDF, a German contract uses TLD names and tech terms, a Japanese receipt has English brand names. To handle that without forcing the user to think about it, picking any non-English primary automatically loads English as a secondary alphabet. The recognition pass then considers both scripts and picks the higher-confidence reading per word. Cost: ~2× memory and ~30% slower than single-language, both worth it because the alternative is garbled brand names.

What this still won't do: handle three or more scripts in one pass (a German document with both Russian and Japanese quotes), or do well on a document where the languages are 50/50 by word count rather than primary + occasional English. For the 50/50 case, run the file twice — once with each language as primary — and compare.

What this tool won't do well

Handwriting. Tesseract is trained on printed text. Cursive notes, doctor's prescriptions, handwritten cards — those need a different class of model (Microsoft's Read API or Google Cloud Vision do better, both of which upload). Heavy table layouts. A complex spreadsheet PDF will come out as flat text with the columns merged together; this tool preserves reading order, not table structure. For tables you want pdf-to-excel. Decorative or stylized fonts. Logos, headline display fonts, and rendered text effects often fail or come back as gibberish — the model expects body-text shapes. Very low resolution. A 320-pixel-wide screenshot of a paragraph will guess; a 1500-pixel-wide one will read it.

Why we keep this in your browser

People OCR documents that are personal: a passport scan to fill in a visa form, a medical bill from urgent care, a salary slip for a mortgage application, a contract someone sent as a JPG instead of a Word file. The shortest path from those photos to readable text in your clipboard usually runs through someone's free OCR website that quietly stores the upload for “quality improvement”. We tested the popular ones for our privacy audit of PDF editors and the pattern repeats in OCR: server upload, retention policies that say “a few hours” but reach back to the same disk, third-party analytics that get a hash of the file shape. We did the work to make OCR work entirely on-device because the kind of file you point at it is the kind you'd rather not hand to a server.

What happens to your file

Open DevTools, switch to the Network tab, then drop your file. You'll see a one-time download of the WASM core and the English language model (about 10 MB combined) on the first run, then absolutely zero outbound requests carrying any part of your file. On subsequent runs there are zero requests, period — the engine is cached and the page works offline. The recognized text lives in a browser textarea you can copy from or download as .txt. None of it touches a server log of ours, ever.

When a server-based OCR makes more sense

If you're processing 10,000 invoices a night, an in-browser tool is the wrong shape — you want a queue and a fleet of GPU workers. If you need handwriting OCR with high accuracy, the cloud APIs are still better than open-source. If your documents are public — historical archives, public-record contracts, your own blog screenshots — there's no privacy upside and the server can be faster. The point isn't that local OCR beats every other approach; it's that for the photos that sit on a single person's camera roll, the trade-off flips and shouldn't require a leap of faith about retention policies.

If you've already pulled text out of a PDF that has a text layer, use pdf-to-txt instead — it's faster and lossless, because it reads the underlying text rather than running OCR on the rendered page.

FAQ

Common questions

Does my image go to your server?

No. The image opens in your browser and is processed there. The recognition engine — Tesseract.js — downloads once (~10 MB), caches in your browser, and runs locally. To verify, open DevTools, switch to the Network tab, and drop a file; you'll see no outbound requests carrying your image.

What file types are supported?

JPG, PNG, WebP, HEIC (iPhone), and single-page PDF. HEIC decodes in the browser — no separate conversion. Multi-page PDFs: only the first page is processed; split the PDF first with split-pdf if you need every page.

What languages does it recognize?

25 languages: English, German, French, Spanish, Italian, Portuguese (covers Brazilian), Polish, Dutch, Danish, Swedish, Romanian, Hungarian, Czech, Turkish, Indonesian, Vietnamese, Russian, Ukrainian, Greek, Arabic, Thai, Japanese, Korean, Chinese Simplified, Chinese Traditional. The picker is in the top bar; the default is auto-detected from your browser locale and persists across sessions.

What about documents that mix Cyrillic / Arabic / CJK with English?

Handled automatically. Pick the primary language (e.g. Ukrainian) and the engine loads English as a secondary alphabet in the same recognition pass — so brand names, URLs, and English technical terms come out clean alongside the Ukrainian body text. A small + EN badge next to the picker shows when this mixed mode is active. The trade-off is ~30% slower recognition and ~2× memory; we ship it on by default because the alternative — single-language pass on a mixed document — is what causes garbled brand names.

How accurate is the recognition?

On a clean 300 DPI scan of typed text, 98–99%. On a sharp phone photo of a printed page or receipt, 95%+. On a blurry, low-light, or skewed image, 60–80% — the result will need proof-reading. Each result shows a confidence score so you know which side of that range you landed on.

Does it work on handwriting?

Not well. Tesseract is trained on printed text — cursive, prescriptions, and handwritten notes confuse it. For handwriting, Microsoft's Read API and Google Cloud Vision do meaningfully better, both of which require uploading the image.

Why is the first run slow?

Because the recognition engine — about 10 MB of WebAssembly plus a language model — downloads to your browser on first use. After that, the engine stays cached and recognition starts instantly. The page works offline after the first load.

Can I use it on a phone?

Yes. The page is a regular web page that runs in any modern mobile browser. On older phones the first run takes longer because of the engine download; subsequent runs are fast.

What about tables, multi-column layouts, or complex documents?

The text comes out in reading order as a flat paragraph stream — table structure isn't preserved. For PDFs where you need rows and columns back as data, use pdf-to-excel. For a PDF that already has a text layer (most digitally-created PDFs do), use pdf-to-txt — it's faster, lossless, and doesn't need OCR.

Is it really free? Any limits?

Yes — no account, no watermark, no per-export limit. The processing runs on your device so there's no server cost to recover. Practical file-size cap is 25 MB per image so the browser doesn't run out of memory.

Image to Text Online