Free PDF to Markdown Converter Online
Extract text from PDF to Markdown via PDF.js. The free tier extracts text only (no complex tables/math recognition). Best for text-heavy PDFs — articles, novels, plain reports.
Drop a file here or
.pdf (text-based, not scans)
Benefits
Extracts text from every page. Optionally split each page into its own MD section.
Mozilla's PDF.js runs entirely in the browser. Your file is never uploaded.
Text extraction only — no complex table layout recognition, no OCR for image scans, no math equations. For high quality, use paid tier (Datalab API, coming soon).
How to use
- 1Drop a .pdf file into the dropzone.
- 2PDF.js parses each page, extracts text content.
- 3Raw Markdown appears immediately — edit to fix layout if needed.
- 4Copy or 'Open in ChatGPT' to feed AI.
PDF to Markdown — why it matters for AI users
PDF is the universal document format — research papers, textbooks, reports, contracts. The catch: PDF was designed for display, not data extraction. PDFs store text as glyphs + positions rather than semantic structure. Feeding raw PDF into ChatGPT/Claude (drag-drop) does work, but results are unstable — especially for multi-column documents, tables, or scanned images.
The free tier uses Mozilla's PDF.js — Firefox's PDF render engine, which has a text-extraction mode. Output is plain text in PDF.js's guessed reading order. Fine for simple text-only PDFs: blog posts saved as PDF, Word reports converted, novels. Not ideal for: scanned PDFs (image-based), complex tables, LaTeX equations, footnote-heavy or multi-column research papers.
For high-quality extraction of research papers / technical books / table-rich reports, you need Marker (high-quality OSS engine) or Datalab API. We're planning a paid tier via Datalab API ($0.005/page) for users who need OCR + layout-aware extraction.
- ✓Extract text from text-based PDFs (not scans)
- ✓Multi-page with option to split pages into separate MD sections
- ✓Basic heading detection (by font size — somewhat unreliable)
- ✓Basic list detection (bullet, numbered)
- ✓100% client-side via PDF.js WebAssembly
- ✓Token count for the resulting Markdown
When the free tier (PDF.js) is enough
PDFs from Save-as-PDF of blogs → simple text → free tier OK.
Originally text → PDF → free tier extracts well since it's not a scan.
Standard PDFs (not scans), few tables → free tier is sufficient.
PDF attachments → quick extract to feed AI for summary.
Research papers with equations/complex tables, scanned PDFs, multi-column scientific docs → need Marker/Datalab (paid).
How it works
PDF.js is Mozilla's open-source project, the default PDF render engine in Firefox. Compiled from the Poppler C++ core to WebAssembly + JavaScript wrapper. Supports PDF spec 1.0-2.0, encryption (excluding DRM), font subsetting, image extraction.
Text extraction goes through getTextContent() — returns an array of text items with position (x, y), font, font size. We reorder by y descending (top-to-bottom) then x ascending (left-to-right) to approximate reading order. Headings are guessed by font size: > 1.5× average = heading. This is a heuristic — not perfect for PDFs with chaotic font usage.
Free tier doesn't OCR. If your PDF is a scanned image, no text layer = nothing to extract. Use a dedicated OCR tool (like imgtools.phanmemtonghop.com/en/ocr with Tesseract) or wait for the paid tier with Marker (built-in OCR).
PDF → Markdown FAQ
Does it work with scanned (image) PDFs?
Free tier NO. PDF.js only extracts the text layer; without one, there's nothing to extract. Use OCR — try imgtools.phanmemtonghop.com/en/ocr for images, or wait for the paid tier with Marker (built-in OCR).
Are tables preserved?
Free tier is very limited — simple 2-3 column text tables OK, complex tables (merged cells, multi-row headers) flatten to plain lines. Paid tier with Marker/Datalab preserves GFM tables.
What about math equations (LaTeX)?
Free tier extracts raw text, not LaTeX. Paid tier with Marker can recognise equations and convert to inline $...$ LaTeX.
Very large PDFs (500+ pages)?
Browsers can handle them but may be slow. Total token count helps you decide whether to chunk before feeding AI.
When will the Marker-powered paid tier launch?
On the roadmap — Phase 2. Estimated $5/mo unlimited or $0.005/page. Contact phanmemtonghop.com to be notified at launch.