Download the PHP package jolicode/jolimarkdown without Composer
On this page you can find all versions of the php package jolicode/jolimarkdown. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download jolicode/jolimarkdown
More information about jolicode/jolimarkdown
Files in jolicode/jolimarkdown
Package jolimarkdown
Short Description A markdown fixer, for enforcing a consistent style in your markdown files.
License MIT
Homepage https://github.com/jolicode/JoliMarkdown
Informations about the package jolimarkdown
JoliMarkdown
Usage
The code above will return a "markdownized" version of the input string:
If you are using Symfony, you may want to read the documentation for the associated bundle.
Installation
Configuration
Several configuration options are available as League CommonMark environment configuration options, to customize the behavior of the Markdown fixer:
internal_domains
: an array of domains that are considered internal to the website. Whenever an image or a link URL is found, that sits under one of the listed domains, the URL will be converted to a relative one. Defaults to[]
.prefer_asterisk_over_underscore
: a boolean to indicate whether to prefer*
over_
for emphasis. Defaults totrue
.unordered_list_marker
: a string to use as the marker for unordered lists. Defaults to-
.
Tests
Tests are located under the tests
directory and are written using PHPUnit:
Context
Markdown is a simple text syntax for writing structured documents. Since its creation in 2004, this syntax has aimed to offer an alternative, faster and simpler way of writing HTML documents for Web publishing. Over the ensuing years, Markdown syntax has evolved iteratively, without any formal, perfectly standardized specification. Various variants have emerged, but none has become a de facto standard.
One of the most robust alternatives, however, is CommonMark, a Markdown variant that was formally specified in 2014 and has been evolving ever since.
Markdown / Commonmark are frequently used in the development world (documentation in the form of a markdown README file, adoption by many publishing platforms) and is often also employed for web publishing. It was, for example, the syntax chosen when the JoliCode website was created in 2012, and is still used today to structure the various bodies of content (blog posts, customer references, technologies, team sheets, etc.).
However, over the last 12 years, our way of transforming Markdown content into HTML has changed: writing a few articles in pure HTML, then using a client-side javascript Markdown pre-processor (in the Web browser), then finally, over the last few years, migrating to the league/commonmark
library, which allows you to transform Markdown into HTML on the server side, in PHP. This library was chosen because it is particularly complete, well-maintained, extensible and robust.
During the development of league/commonmark
, extension mechanisms were added, to support different Markdown "extensions", i.e. to support syntax elements that are not part of the CommonMark standard, but bring syntactic flexibility to writers. For example, the tables extension makes it possible to write tables in Markdown, with a lighter, more readable syntax, which is not possible in "standard" CommonMark.
One of the founding features of Markdown is its compatibility with HTML: in Markdown, it's perfectly valid to insert HTML tags into text, and these will simply be passed on as they are in the final HTML document. For example, you can write:
Such a document will be rendered, in HTML, as follows:
CommonMark's extension mechanism is therefore interesting, as it allows syntactic elements to be added that the extension will be able to interpret to generate rich, complex HTML output, without the end user (the editor) having to write HTML. This notion of extension is provided for in CommonMark (the CommonMark specification is itself written in CommonMark and uses an extension to generate side-by-side rendering of Markdown syntax and the corresponding HTML output, as can be seen, for example, in the tabs section).
On the JoliCode site, we've taken advantage of the flexibility of league/commonmark
to enrich HTML rendering, over the years, so that we can write richer, more expressive, more visual Markdown documents. For example, we've added an extension to write footnotes, HTML tables, strikethrough text, add HTML attributes to external links, automatically add attributes to <img>
tags, and so on.
In spite of this, over the past 12 years we have frequently written HTML code within Markdown articles, in order to meet certain needs:
- add CSS classes to HTML elements, to be able to style them differently (centering an image on the page, for example) ;
- insert code with CSS classes, to use a syntax highlighting library;
- create the HTML structure to position two images side by side ;
- etc.
Sometimes HTML code has been added because the author of an article was uncomfortable with certain arcana of markdown, and chose the most direct approach to be able to publish his content. The use of HTML may have been appropriate at the time, but as the possibilities offered by HTML change, so do its limits: whereas for elements written in markdown, we can now make the program in charge of HTML rendering evolve to take on board new HTML functionalities, we can't do this for elements written directly in HTML, which will remain frozen in time in the form their author has chosen.
For example, we'd like to be able to offer images in modern, higher-performance formats (such as webp, which is both smaller and of better quality) than those used just a few years ago. For these images, we also want to move away from the use of the <img>
tag, and take advantage of <picture>
, <source>
tags, and attributes like srcset
. For images that have been inserted into articles using Markdown syntax, we can upgrade the HTML rendering program to support these new formats and tags. For images that have been inserted in HTML, we can't do this, and so have to replace them manually - or leave them as they are, with the inconvenience of having to accept that the articles concerned use dated, less efficient technologies, which have an impact on both speed and the comfort offered to site users.
So we're looking for an approach to correct existing Markdown articles, replacing the HTML elements they contain with equivalent Markdown elements wherever possible without distorting the final HTML rendering.
An extension, available in league/commonmark
for a few years now, can specifically help us with this task: it's the Attributes extension, which lets you add HTML attributes to Markdown elements. For example, you can write:
which will be rendered in HTML as follows:
With the help of this extension, we'd like to be able to write a program which, for each Markdown article on the site, will:
- analyze the Markdown content of the article;
- identify HTML elements that can be replaced by equivalent Markdown elements;
- replace these HTML elements with Markdown elements, adding the necessary HTML attributes so that the final HTML rendering is identical to that of the original article.
This repository proposes a tool to achieve this goal, using the following overall approach:
- from an existing string (the initial Markdown content of the article), an abstract syntax tree (AST) is generated using the
league/commonmark
Markdown parser. This parser is specifically configured, with few extensions enabled, to be as close as possible to standard CommonMark syntax and to obtain an AST that contains (almost) only the basic syntactic elements of CommonMark ; - the
league/commonmark
parser returns aDocument
, which is a hierarchy ofNodes
(aNode
being an element of the AST). Each node is typed (for example, aNode
of typeParagraph
represents a paragraph, aNode
of typeImage
represents an image, etc.), and the HTML code parts are parsed in the form ofHtmlBlock
orHtmlInline
nodes; - this Document is then corrected via a set of correction classes. For example:
- if a Node of type
FencedCode
has a CSS class attributelanguage-php
, then this class is removed and instead the Node'sInfo
attribute is updated with the valuephp
; - if a node of type
Image
has an absolute URL while the image is served by the JoliCode site, then this URL is replaced by a relative URL. Numerous tests can be used to check special cases and different situations. - ditto for HTML links;
- for
HTMLBlock
andHTMLInline
nodes, a special treatment is applied: - HTML content is loaded into a DOM tree
- this tree is then recursively traversed in an attempt to reconstitute equivalent pure Markdown nodes. For example, if we find a
<p>
element in the DOM, we'll try to replace it with a Markdown Node of typeParagraph
. Each time, the HTML attributes are transformed into attributes as proposed by theleague/commonmark
"Attributes" extension; - as a last resort, if at a given level of recursion we are unable to reconstitute a Markdown Node :
- we use the
league/html-to-markdown
library to try and convert HTML into Markdown. This step is necessary to transform HTML elements into Markdown that are not supported by the correction classes we've implemented (for example, HTML tables: we don't offer a "Fixer" for the DOM element<table>
). - the resulting string is returned as a new
HTMLBlock
orHTMLInline
node, depending on the type of first-level node;
- we use the
- if a Node of type
- finally, as a last step, the new
Document
thus corrected is "rendered" as a string, which is the corrected Markdown content of the original article. For this purpose, a set of Renderer classes have been written, heavily inspired by the wnx/commonmark-markdown-renderer library.
License
This library is under MIT License. See the LICENSE file.
All versions of jolimarkdown with dependencies
ext-mbstring Version *
lib-libxml Version *
league/commonmark Version ^2.4
league/html-to-markdown Version ^5.1
psr/log Version ^3.0