Free HTML to Markdown Converter Online
Paste HTML from any webpage, blog, or document — get clean Markdown ready for ChatGPT/Claude/Gemini. Live token counter, one-click open in AI chat.
Benefits
Auto-removes <script>, <style>, comments, and excess attributes. Keeps only semantic content + structure.
HTML is full of redundant tags. Markdown is much more compact — feed the same content to AI for far fewer tokens.
GitHub-flavoured tables, strikethrough, task lists, fenced code. Works perfectly in Claude, Notion, GitHub.
How to use
- 1Open the webpage you want to convert, copy the HTML (View Source or Inspect → Copy outerHTML).
- 2Paste the HTML into the input box. You can paste full <html> or just a fragment.
- 3Markdown appears below. View token count for each AI model.
- 4Click 'Open in ChatGPT/Claude' for auto-copy + open chat, or download the .md file.
What is HTML to Markdown?
HTML to Markdown converts HTML code (full document or fragment) into Markdown — plain text with lightweight syntax. HTML uses many tags (<div>, <span>, class/id/style attributes) that bloat the file; Markdown uses only special characters (#, *, -, []) for structure, making it dramatically smaller.
Our engine is Turndown — the most popular JavaScript library for HTML→MD with 9k+ GitHub stars. Turndown parses the DOM, recognises headings/lists/links/tables/code blocks, and emits the corresponding Markdown syntax. Everything runs in the browser via WebAssembly — your file never leaves your machine.
Perfect for devs feeding READMEs/blog posts into Claude for refactoring; content creators cleaning HTML from Word/Google Docs before publishing; AI engineers building RAG pipelines who need clean Markdown chunks before vector embedding.
- ✓Full HTML5 tag support: heading, paragraph, list, table, link, image, code, blockquote, hr
- ✓GitHub Flavored Markdown — tables, strikethrough, task lists, fenced code blocks
- ✓Auto-removes <script>, <style>, comments, and ad-related divs
- ✓Exact token counts for GPT-4o, estimates for Claude and Gemini
- ✓One-click open in ChatGPT/Claude/Gemini
- ✓Works offline after the first load — files never leave your browser
When to use it
Copy a blog post from Medium/Substack/WordPress, convert to Markdown, paste into Claude for summary or refactor.
Building a vector DB from web pages — clean Markdown chunks always embed better than raw HTML.
HTML emails have inline styles and table layouts. Markdown helps AI understand the content.
Take posts from WordPress/Drupal to Markdown for Hugo, Jekyll, Astro, or Next.js content.
Fine-tuning text models — Markdown is cleaner than HTML for the training corpus.
How it works
Turndown is the most-used HTML→MD library in the JavaScript ecosystem with 9k+ GitHub stars, used by Notion, Obsidian, Bear, and many major editors. It parses HTML to a DOM (via JSDOM on server, native DOMParser in browser), traverses the tree, and applies rules per tag → Markdown syntax.
We configure Turndown with the GFM plugin (GitHub Flavored Markdown) for tables, strikethrough, and task lists — important since Claude/ChatGPT both render GFM. Headings use ATX style (# H1), code blocks use fenced (```) instead of indented, and lists use '-' uniformly.
Token counting uses gpt-tokenizer — a pure-JS port of OpenAI's tiktoken (BPE encoder), 100% accurate for GPT-4o and GPT-4. For Claude/Gemini we apply approximation factors (×1.05 for Claude, ×0.95 for Gemini) — under 5% error for English text, slightly higher for Vietnamese due to multi-byte UTF-8.
HTML → Markdown FAQ
Is there a size limit?
Up to 10 MB of HTML per conversion. Enough for most blog posts, documents, emails. Larger files should be split.
How are images handled?
Converted to Markdown  syntax. URLs are preserved — make sure they're publicly accessible if AI needs to fetch them.
Are ads and tracking scripts stripped?
Yes. <script>, <iframe>, <noscript>, and HTML comments are removed. Common ad classes (ads, banner, promo) are also filtered.
Does it support inline HTML in Markdown output?
By default Turndown keeps inline HTML for tags without a Markdown equivalent (like <video>, <audio>). Toggle 'Strict Markdown' to drop everything non-MD.
Is the token count accurate for non-English text?
GPT-4o and GPT-4: 100% accurate. Non-Latin scripts (Vietnamese, Chinese, Arabic) often use 1.5-2× the tokens vs English of equivalent length, due to multi-byte UTF-8. The counter reflects the actual cost.