Download the PHP package nickcernis/html-to-markdown without Composer
On this page you can find all versions of the php package nickcernis/html-to-markdown. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download nickcernis/html-to-markdown
More information about nickcernis/html-to-markdown
Files in nickcernis/html-to-markdown
Package html-to-markdown
Short Description An HTML-to-markdown conversion helper for PHP
License MIT
Homepage https://github.com/thephpleague/html-to-markdown
Informations about the package html-to-markdown
HTML To Markdown for PHP
Library which converts HTML to Markdown for your sanity and convenience.
Requires: PHP 7.2+
Lead Developer: @colinodell
Original Author: @nickcernis
Why convert HTML to Markdown?
"What alchemy is this?" you mutter. "I can see why you'd convert Markdown to HTML," you continue, already labouring the question somewhat, "but why go the other way?"
Typically you would convert HTML to Markdown if:
- You have an existing HTML document that needs to be edited by people with good taste.
- You want to store new content in HTML format but edit it as Markdown.
- You want to convert HTML email to plain text email.
- You know a guy who's been converting HTML to Markdown for years, and now he can speak Elvish. You'd quite like to be able to speak Elvish.
- You just really like Markdown.
How to use it
Require the library by issuing this command:
Add require 'vendor/autoload.php';
to the top of your script.
Next, create a new HtmlConverter instance, passing in your valid HTML code to its convert()
function:
The $markdown
variable now contains the Markdown version of your HTML as a string:
The included demo
directory contains an HTML->Markdown conversion form to try out.
Conversion options
[!CAUTION]
By default, this library preserves HTML tags without Markdown equivalents, like<span>
,<div>
,<iframe>
,<script>
, etc. If you will be parsing untrusted input from users, please consider setting thestrip_tags
and/orremove_nodes
options documented below, and also using a library (like HTML Purifier) to provide additional HTML filtering.
To strip HTML tags that don't have a Markdown equivalent while preserving the content inside them, set strip_tags
to true, like this:
Or more explicitly, like this:
Note that only the tags themselves are stripped, not the content they hold.
To strip tags and their content, pass a space-separated list of tags in remove_nodes
, like this:
By default, all comments are stripped from the content. To preserve them, use the preserve_comments
option, like this:
To preserve only specific comments, set preserve_comments
with an array of strings, like this:
By default, placeholder links are preserved. To strip the placeholder links, use the strip_placeholder_links
option, like this:
Style options
By default bold tags are converted using the asterisk syntax, and italic tags are converted using the underlined syntax. Change these by using the bold_style
and italic_style
options.
Line break options
By default, br
tags are converted to two spaces followed by a newline character as per traditional Markdown. Set hard_break
to true
to omit the two spaces, as per GitHub Flavored Markdown (GFM).
Autolinking options
By default, a
tags are converted to the easiest possible link syntax, i.e. if no text or title is available, then the <url>
syntax will be used rather than the full [url](url)
syntax. Set use_autolinks
to false
to change this behavior to always use the full link syntax.
Passing custom Environment object
You can pass current Environment
object to customize i.e. which converters should be used.
Table support
Support for Markdown tables is not enabled by default because it is not part of the original Markdown syntax. To use tables add the converter explicitly:
Limitations
- Markdown Extra, MultiMarkdown and other variants aren't supported – just Markdown.
Style notes
-
Setext (underlined) headers are the default for H1 and H2. If you prefer the ATX style for H1 and H2 (# Header 1 and ## Header 2), set
header_style
to 'atx' in the options array when you instantiate the object:$converter = new HtmlConverter(array('header_style'=>'atx'));
Headers of H3 priority and lower always use atx style.
- Links and images are referenced inline. Footnote references (where image src and anchor href attributes are listed in the footnotes) are not used.
- Blockquotes aren't line wrapped – it makes the converted Markdown easier to edit.
Dependencies
HTML To Markdown requires PHP's xml, lib-xml, and dom extensions, all of which are enabled by default on most distributions.
Errors such as "Fatal error: Class 'DOMDocument' not found" on distributions such as CentOS that disable PHP's xml extension can be resolved by installing php-xml.
Contributors
Many thanks to all contributors so far. Further improvements and feature suggestions are very welcome.
How it works
HTML To Markdown creates a DOMDocument from the supplied HTML, walks through the tree, and converts each node to a text node containing the equivalent markdown, starting from the most deeply nested node and working inwards towards the root node.
To-do
- Support for nested lists and lists inside blockquotes.
- Offer an option to preserve tags as HTML if they contain attributes that can't be represented with Markdown (e.g.
style
).
Trying to convert Markdown to HTML?
Use one of these great libraries:
- league/commonmark (recommended)
- cebe/markdown
- PHP Markdown
- Parsedown
No guarantees about the Elvish, though.
All versions of html-to-markdown with dependencies
ext-dom Version *
ext-xml Version *