Markdent:
Event-Driven Markdown Parsing
Dave Rolsky
What is Markdown?
- Wikitext-ish
- Matches existing plaintext "markup"
- Standard
Why is Markdown Important?
- Standard!
- Easy to use
- (Please do not invent your own wikitext)
An Example
This is **bold**, this is *italic*.
* A list
* of items
> A blockquote which contains
> [a link](http://example.com)
Why a New Parser?
- Existing tools just converted to HTML
- No way to parse interesting data
- HTML conversion can be lossy
With Markdent You Can ...
- Convert to HTML (duh)
- Extract all links
- Extract all tables
- Extract all text (for full text search)
- Filter out some markup (raw HTML)
But Wait, There's More!
- Extend the markup syntax
- Cache parsed data, but not result
- [Insert your invention here]
- Markdent is a parsing toolkit
What is "Event-Driven"?
- Just like SAX
- Parser generates a sequence of events
- start document
- start paragraph
- text
- start emphasis
- text
- end emphasis
- end paragraph
- end document
What is "Event-Driven"?
- Parser passes events to a "handler"
- Handler can do anything
- Cache events
- Generate HTML
- Filter events
Handlers Can Chain
Parser
|
|
Filter
|
|
Multiplexer
/ \
/ \
Event Cache HTML Output
Chained Handlers
my $buffer = q{};
open my $fh, '>', \$buffer;
my $capture = Markdent::Handler::CaptureEvents->new;
^ Event Cache
my $html = Markdent::Handler::HTMLStream( output => $fh );
^ HTML Output
my $multi = Markdent::Handler::Multiplexer->new(
handlers => [ $capture, $html ],
);
^ Multiplexer
my $filter =
Markdent::Handler::HTMLFilter->new( handler => $multi );
^ Filter
my $parser = Markdent::Parser->new( handler => $filter );
^ Parser
$parser->parse( markdown => ... );
Custom Dialect
package MyWiki::Dialect::SpanParser;
use Moose;
extends 'Markdent::Dialect::Standard::SpanParser';
overrides _possible_span_matches => sub {
my $self = shift;
my @look_for = super();
# inside code span
return @look_for if @look_for == 1;
insert_after_string
'code_start', 'wiki_link', @look_for;
return @look_for;
};
Custom Dialect
package MyWiki::Dialect::SpanParser;
sub _match_wiki_link {
my $self = shift;
my $text = shift;
return unless ${$text} =~ / ... /xmgc;
my %p = ( link_text => $1 );
$p{display_text} = $2
if defined $2;
my $event = $self->_make_event(
'MyWiki::Event::WikiLink' => %p );
$self->_markup_event($event);
return 1;
}
In the Box - Standard Dialect
- Original Markdown
- Passes mdtest suite
In the Box - GitHub Dialect
- Adds various GitHub extensions ...
- Fenced code blocks (
```\n...\n```
)
- Doesn't match underscores in words so
foo_bar_baz
does not italicize "bar"
- Linkifies bare links in text
In the Box - Theory Dialect
- Support for tables
- Header rows, cell alignment, colspan > 1, multi-line cells, multiple <tbody>
+------+-------------+-----------------------+--------+
| id | name | description | price |
+------+-------------+-----------------------+--------+
| 1 | gizmo | Frabbles the blatzer | 1.99 |
| 2 | doodad | Collects *gizmos* | 23.80 |
| 10 | dojigger | Foo | 102.98 |
| 1024 | thingamabob | Self-explanatory, no? | 0.99 |
+------+-------------+-----------------------+--------+
In the Box - Handlers
- Event capture (for caching
- HTML output
- Raw HTML removal filter
- Multiplexer
- Minimal tree (mostly for testing)
In the Box - Simple HTML Output
use Markdent::Simple::Document;
my $msd =
Markdent::Simple::Document->new;
my $html = $msd->markdown_to_html(
title => 'My Document',
markdown => $markdown,
);