PDF to Excel

PDF to Excel
Online

Extract tables from PDFs into editable Excel workbooks. Detection and parsing happen entirely in your browser.

Drop your PDF
We detect tables and export them as a clean XLSX file.

Verify yourself: open DevTools → Network tab → drop a file. Watch zero uploads happen.

Tables auto-detected Works offline after first load
Free
No Sign-Up
No Upload
Tables Preserved
HOW IT WORKS

Three steps. Zero uploads.

1

Drop your PDF

Load into browser memory.

2

Detect tables

We find table boundaries and parse cells client-side.

3

Download XLSX

Open in Excel, Numbers, or Google Sheets.

When the table is in a PDF and needs to be a table

The reasons this comes up are usually downstream of someone else's choice of format. Your bank statement arrives as a PDF and you want to drop the transactions into a spreadsheet to total a category. The price list a supplier emailed is in a PDF, and you want to compare it to last quarter's. A regulator publishes data tables only as PDF and you need them as cells to sort and filter. A scraped report is in PDF form and you want each row as a row. In all of these cases, the numbers exist; they just don't behave like numbers because they're locked into a layout.

The job here is to recover the rows and columns: take the PDF, give back an .xlsx with each page as its own sheet, each row as a row, each column as a column. From there you can sort, filter, sum, pivot — the things spreadsheets do.

Why this is a guess, and a fairly good one

PDFs don't store tables. Internally a page is a flat stream of text fragments with x/y positions on the canvas. Nothing says "this is a row" or "this column starts here" — those structures only exist in your eye, when you read. Recovering them means inferring the grid from where the text actually sits.

The tool does that in a few stages. First, it groups text fragments into rows by y-position: if two fragments are within roughly half a line-height of each other vertically, they belong in the same row. Then within each row, fragments close together horizontally are merged into a single cell (normal letter spacing) and wider gaps become cell boundaries (column gutters). Finally, it looks at where cells start across the whole page, finds the dominant x-positions, and treats those as the column centers — every cell gets assigned to the column it's closest to. That gives you the rectangular grid the spreadsheet needs.

What the algorithm gets right

  • Plain financial tables — bank statements, invoices, price lists, expense reports. Single line per row, clear column gutters, numbers right-aligned: this is the case the heuristic was tuned for, and it usually comes through correctly.
  • Single-row entries. When each record fits on one line, the row detection is reliable.
  • Tables that fill the whole page width. Column detection works best when columns are well-spaced and consistent across the page.
  • Right- and left-aligned columns. The tool handles both — what matters is that there's a recognisable gap between columns.

Where it has trouble

  • Wrapped cells. A cell whose value runs to a second line — a long product description, a multi-line address — usually gets split into two rows. The fix is light cleanup in the spreadsheet (or running pdf-to-txt instead and rebuilding the table by hand if cleanup isn't worth it).
  • Merged cells. A header that visually spans two columns appears in only one of them — the leftmost it's closest to. You'll see the merge as a value in column A and an empty B.
  • Two tables side by side. The column detector sees both at once and can flatten them into one wide table. Splitting the PDF into halves first usually helps.
  • Body text on the same page as a table. A paragraph above the table contributes its own "rows" of text — you'll get a few rows that aren't really rows, easy to delete.
  • Multi-page tables. Each page becomes a separate sheet. To stitch them back into one continuous table, copy and paste the data sections after extraction.
  • Scanned PDFs. Same caveat as for any text extraction here: if the page is an image (a scanned bank statement, a photographed receipt), there's no text to read. Run it through OCR in another tool first.

What you get at the end

One .xlsx file. Each PDF page becomes its own sheet, named Page 1, Page 2, etc. Open it in Excel, Numbers, Google Sheets, or LibreOffice — any of them. Cells are plain values; there's no formatting, no formulas, no styles. The point is that the numbers are now numbers and the columns are now columns.

If you'd rather have CSV than xlsx, save the sheet as CSV from your spreadsheet app — it's one menu away.

Practical notes

  1. If the PDF has a password, open it through unlock-pdf first. Encrypted PDFs can't be opened for cell extraction.
  2. For very wide tables, rotating the source PDF to landscape before extraction helps the column detector — fewer columns, better gaps. Use edit-pdf to rotate.
  3. If you only need the text and not the table structure, pdf-to-txt is faster and the result is easier to reflow.
  4. The reverse operation — putting an Excel sheet into a PDF — is excel-to-pdf.

What happens to your file

Extraction runs in your browser. Open DevTools and watch the Network tab during the operation — there are no outbound requests carrying the file content. The PDF stays on your disk; the .xlsx is a new download alongside it.

FAQ

Frequently asked

Is my PDF uploaded?

No. Detection and parsing happen entirely in your browser. The file never leaves your device.

Does it work on scanned PDFs (OCR)?

Scanned tables need OCR to become editable cells. Text-based PDFs (with selectable text) extract directly.

How accurate is table detection?

Ruled tables and well-spaced grids extract cleanly. Merged cells and borderless tables may need manual review.

Can I pick specific pages?

Yes. Select a page range to extract tables from only the pages you need.