← All converters

Free PDF to Markdown Converter Online

Extract text from PDF to Markdown via PDF.js. The free tier extracts text only (no complex tables/math recognition). Best for text-heavy PDFs — articles, novels, plain reports.

PDF.jsMulti-pageText onlyFreeIn-browser
📰

Drop a file here or

.pdf (text-based, not scans)

Benefits

📄
Multi-page support

Extracts text from every page. Optionally split each page into its own MD section.

🔒
100% in-browser

Mozilla's PDF.js runs entirely in the browser. Your file is never uploaded.

⚠️
Free tier limits

Text extraction only — no complex table layout recognition, no OCR for image scans, no math equations. For high quality, use paid tier (Datalab API, coming soon).

How to use

  1. 1Drop a .pdf file into the dropzone.
  2. 2PDF.js parses each page, extracts text content.
  3. 3Raw Markdown appears immediately — edit to fix layout if needed.
  4. 4Copy or 'Open in ChatGPT' to feed AI.

PDF to Markdown — why it matters for AI users

PDF is the universal document format — research papers, textbooks, reports, contracts. The catch: PDF was designed for display, not data extraction. PDFs store text as glyphs + positions rather than semantic structure. Feeding raw PDF into ChatGPT/Claude (drag-drop) does work, but results are unstable — especially for multi-column documents, tables, or scanned images.

The free tier uses Mozilla's PDF.js — Firefox's PDF render engine, which has a text-extraction mode. Output is plain text in PDF.js's guessed reading order. Fine for simple text-only PDFs: blog posts saved as PDF, Word reports converted, novels. Not ideal for: scanned PDFs (image-based), complex tables, LaTeX equations, footnote-heavy or multi-column research papers.

For high-quality extraction of research papers / technical books / table-rich reports, you need Marker (high-quality OSS engine) or Datalab API. We're planning a paid tier via Datalab API ($0.005/page) for users who need OCR + layout-aware extraction.

  • Extract text from text-based PDFs (not scans)
  • Multi-page with option to split pages into separate MD sections
  • Basic heading detection (by font size — somewhat unreliable)
  • Basic list detection (bullet, numbered)
  • 100% client-side via PDF.js WebAssembly
  • Token count for the resulting Markdown

When the free tier (PDF.js) is enough

Blog-saved-as-PDF

PDFs from Save-as-PDF of blogs → simple text → free tier OK.

Word/Google Docs reports

Originally text → PDF → free tier extracts well since it's not a scan.

Text-heavy novels / non-fiction

Standard PDFs (not scans), few tables → free tier is sufficient.

Email PDF attachments

PDF attachments → quick extract to feed AI for summary.

When to use paid tier?

Research papers with equations/complex tables, scanned PDFs, multi-column scientific docs → need Marker/Datalab (paid).

How it works

PDF.js is Mozilla's open-source project, the default PDF render engine in Firefox. Compiled from the Poppler C++ core to WebAssembly + JavaScript wrapper. Supports PDF spec 1.0-2.0, encryption (excluding DRM), font subsetting, image extraction.

Text extraction goes through getTextContent() — returns an array of text items with position (x, y), font, font size. We reorder by y descending (top-to-bottom) then x ascending (left-to-right) to approximate reading order. Headings are guessed by font size: > 1.5× average = heading. This is a heuristic — not perfect for PDFs with chaotic font usage.

Free tier doesn't OCR. If your PDF is a scanned image, no text layer = nothing to extract. Use a dedicated OCR tool (like imgtools.phanmemtonghop.com/en/ocr with Tesseract) or wait for the paid tier with Marker (built-in OCR).

PDF → Markdown FAQ

Does it work with scanned (image) PDFs?

Free tier NO. PDF.js only extracts the text layer; without one, there's nothing to extract. Use OCR — try imgtools.phanmemtonghop.com/en/ocr for images, or wait for the paid tier with Marker (built-in OCR).

Are tables preserved?

Free tier is very limited — simple 2-3 column text tables OK, complex tables (merged cells, multi-row headers) flatten to plain lines. Paid tier with Marker/Datalab preserves GFM tables.

What about math equations (LaTeX)?

Free tier extracts raw text, not LaTeX. Paid tier with Marker can recognise equations and convert to inline $...$ LaTeX.

Very large PDFs (500+ pages)?

Browsers can handle them but may be slow. Total token count helps you decide whether to chunk before feeding AI.

When will the Marker-powered paid tier launch?

On the roadmap — Phase 2. Estimated $5/mo unlimited or $0.005/page. Contact phanmemtonghop.com to be notified at launch.