This Space for Rent

Fun with edge cases

If I feed the string “foo ” to Markdown.pl, it becomes <p>foo </p>. If I feed it to discount, it becomes <p>foo<br/></p>. This is because I’m following the spec where it says “end the line with two or more spaces, then type return”.

So I’m not following the specification exactly here. My fault (and it’s not easily catchable because of the way I parse blocks – I cut the input up into a linked list of lines, and the last line ends up as a single line whether or not there’s a newline at the end of it.

I could fix that by not trimming out the newlines when building the initial list of lines (and then silently dropping it when I’m doing the htmlification pass(es) through the compiled blocklist. However it turns out that it’s not just lines that don’t end with newlines; if I have input like (pipes added to show end of lines; see that there are two or more spaces at the end of the “foo” and “bar” paragraphs):

foo  |
|
bar  |
|

Markdown.pl spits out

<p>foo   </p>

<p>bar   </p>

So even though the spec says “two or more spaces, then type a return”, the reference implementation overrules it and says “two or more spaces, unless you’re at the end of a paragraph.”

Ugh.

This is a one line fix (in generate.c, function printblock(), add && t->next to the check for two or more blank lines) but it’s kind of ugly. It breaks two of my test cases, but (obviously) neither of the official Daring Fireball test suites I test against (1.0 and 1.0.3.)

So should I

  1. Modify my code to follow the reference implementation, or
  2. Modify my code to more closely follow the spec (the edge case of two+ spaces followed by EOF), or
  3. Open another beer and just forget about the whole thing?

My “I want this code to be PERFECT, PERFECT!” instinct says to choose #1 (which I’ve already written the patch for,) but my “but I’ve got a userbase now and they might be unhappy!” sense likes #3.