← All converters

Free HTML to Markdown Converter Online

Paste HTML from any webpage, blog, or document — get clean Markdown ready for ChatGPT/Claude/Gemini. Live token counter, one-click open in AI chat.

Turndown engineGFM tablesCode blocksToken counter100% private
Powered by Jina Reader

Benefits

🧹
Strip HTML noise

Auto-removes <script>, <style>, comments, and excess attributes. Keeps only semantic content + structure.

📊
Save 70-80% of tokens

HTML is full of redundant tags. Markdown is much more compact — feed the same content to AI for far fewer tokens.

Full GFM support

GitHub-flavoured tables, strikethrough, task lists, fenced code. Works perfectly in Claude, Notion, GitHub.

How to use

  1. 1Open the webpage you want to convert, copy the HTML (View Source or Inspect → Copy outerHTML).
  2. 2Paste the HTML into the input box. You can paste full <html> or just a fragment.
  3. 3Markdown appears below. View token count for each AI model.
  4. 4Click 'Open in ChatGPT/Claude' for auto-copy + open chat, or download the .md file.

What is HTML to Markdown?

HTML to Markdown converts HTML code (full document or fragment) into Markdown — plain text with lightweight syntax. HTML uses many tags (<div>, <span>, class/id/style attributes) that bloat the file; Markdown uses only special characters (#, *, -, []) for structure, making it dramatically smaller.

Our engine is Turndown — the most popular JavaScript library for HTML→MD with 9k+ GitHub stars. Turndown parses the DOM, recognises headings/lists/links/tables/code blocks, and emits the corresponding Markdown syntax. Everything runs in the browser via WebAssembly — your file never leaves your machine.

Perfect for devs feeding READMEs/blog posts into Claude for refactoring; content creators cleaning HTML from Word/Google Docs before publishing; AI engineers building RAG pipelines who need clean Markdown chunks before vector embedding.

  • Full HTML5 tag support: heading, paragraph, list, table, link, image, code, blockquote, hr
  • GitHub Flavored Markdown — tables, strikethrough, task lists, fenced code blocks
  • Auto-removes <script>, <style>, comments, and ad-related divs
  • Exact token counts for GPT-4o, estimates for Claude and Gemini
  • One-click open in ChatGPT/Claude/Gemini
  • Works offline after the first load — files never leave your browser

When to use it

Feed blog posts to AI

Copy a blog post from Medium/Substack/WordPress, convert to Markdown, paste into Claude for summary or refactor.

Pre-process content for RAG

Building a vector DB from web pages — clean Markdown chunks always embed better than raw HTML.

Convert HTML email

HTML emails have inline styles and table layouts. Markdown helps AI understand the content.

Migrate to a static site

Take posts from WordPress/Drupal to Markdown for Hugo, Jekyll, Astro, or Next.js content.

Build AI training corpus

Fine-tuning text models — Markdown is cleaner than HTML for the training corpus.

How it works

Turndown is the most-used HTML→MD library in the JavaScript ecosystem with 9k+ GitHub stars, used by Notion, Obsidian, Bear, and many major editors. It parses HTML to a DOM (via JSDOM on server, native DOMParser in browser), traverses the tree, and applies rules per tag → Markdown syntax.

We configure Turndown with the GFM plugin (GitHub Flavored Markdown) for tables, strikethrough, and task lists — important since Claude/ChatGPT both render GFM. Headings use ATX style (# H1), code blocks use fenced (```) instead of indented, and lists use '-' uniformly.

Token counting uses gpt-tokenizer — a pure-JS port of OpenAI's tiktoken (BPE encoder), 100% accurate for GPT-4o and GPT-4. For Claude/Gemini we apply approximation factors (×1.05 for Claude, ×0.95 for Gemini) — under 5% error for English text, slightly higher for Vietnamese due to multi-byte UTF-8.

HTML → Markdown FAQ

Is there a size limit?

Up to 10 MB of HTML per conversion. Enough for most blog posts, documents, emails. Larger files should be split.

How are images handled?

Converted to Markdown ![alt](url) syntax. URLs are preserved — make sure they're publicly accessible if AI needs to fetch them.

Are ads and tracking scripts stripped?

Yes. <script>, <iframe>, <noscript>, and HTML comments are removed. Common ad classes (ads, banner, promo) are also filtered.

Does it support inline HTML in Markdown output?

By default Turndown keeps inline HTML for tags without a Markdown equivalent (like <video>, <audio>). Toggle 'Strict Markdown' to drop everything non-MD.

Is the token count accurate for non-English text?

GPT-4o and GPT-4: 100% accurate. Non-Latin scripts (Vietnamese, Chinese, Arabic) often use 1.5-2× the tokens vs English of equivalent length, due to multi-byte UTF-8. The counter reflects the actual cost.