diff options
Diffstat (limited to 'ars-technica/published/markdown.txt')
-rw-r--r-- | ars-technica/published/markdown.txt | 157 |
1 files changed, 157 insertions, 0 deletions
diff --git a/ars-technica/published/markdown.txt b/ars-technica/published/markdown.txt new file mode 100644 index 0000000..ffc4111 --- /dev/null +++ b/ars-technica/published/markdown.txt @@ -0,0 +1,157 @@ +Markdown is a Perl script that converts plain text into web-ready HTML. Markdown is also a shorthand syntax for writing HTML tags without needing to write the actual HTML. It has been around for a decade now, but hasn't seen an update in all that time -- nearly unheard of for a piece of software. That Markdown continues to work at all is somewhat amazing. + +Regrettably, works and works well are not the same thing. Markdown, despite its longevity, has bugs. Luckily Markdown is free software, licensed under a BSD-style license, so anyone can fork it and fix those bugs. + +Recently a group of developers who rely on Markdown set out to fix some of those bugs and created what they call a "standard" version. From a pure code standpoint the results are great, but instead of the developer gratitude you might expect, the group found themselves at the center of a much larger and very contentious debate that's ultimately about who we want in control of the tools we use. + +## HTML is for Browsers + +The web turned the whole world into writers. Never in the history of the human race have so many people produced so much text. The web has not, however, turned the whole world into writers of HTML. If writing HTML were a requirement to writing on the web, very few people would be writing on the web. + +Not that it's particularly *hard* to write HTML. Only a small subset of the hundreds of HTML tags actually end up in the average bit of text. Most of the time you can get by with paragraph tags, em, strong, and anchor tags for links. And of course list tags, where would the modern web be without list tags? + +In other words, it's not that hard to write HTML. But it is a pain. + +Typing out all those tags creates an extra wall between you and your thoughts. No one wants to put `<p>` at the start of every paragraph. And then `</p>` at the end, we just want to hit return and keep typing, which is what I did at the end of the previous paragraph. In fact, despite the fact that you're reading this article as a rendering bit of HTML in a webpage, I have not typed a single HTML tag while writing it. + +Chances are you posted something on Twitter today, chatted with your friends on Facebook, wrote something on your WordPress blog, posted something to Tumblr, committed a bit of code to GitHub, answered a question on Stack Overflow or did a hundred other things that ended up rendered in HTML. You most likely did all that without ever actually typing any HTML tags. + +Most of the time HTML is hidden by a "rich" text editor, which takes care of creating all the necessary HTML tags for you. WordPress, Tumblr and other sites not aimed at developers tend to use rich text editors. + +Developers and the sites they interact with on the other hand often use Markdown. + +## Markdown, a Tool for Web Writers + +Markdown began life as a little Perl script written by John Gruber and Aaron Swartz back in 2004. Gruber had just started writing daringfireball.net and quickly realized that the article-as-a-fragment-of-HTML model that most publishing systems used at the time was lacking. Like most of us, Gruber wanted to edit and preview his writing in the text editor of his choice before pasting that text into the publishing system. + +HTML is great at many things, but reading raw HTML is terrible. HTML is a markup language, it's a second stage presentation format. That is, you want to get words on the web. So the first stage is to type those words. The second is to add HTML so they look the way you intended in a web browser. + +No one wants to read, let alone try to edit, text when it's littered with HTML tags. + +Gruber and Swartz wanted to write first and convert to HTML later, which is what Markdown allows you to do. Gruber and Swartz came up with a shorthand syntax for common HTML elements and Markdown then parses through your text, finds those shorthand markers and replaces them with HTML tags. It also automatically wraps your paragraphs in `<p>` tags (you just need to leave a blank line between them). + +Markdown is not an all or nothing syntax. You can pick and choose what you want to use. For example, in ten years of writing in Markdown I have never used its image syntax. For me Markdown's image syntax is no easier to read or simpler to type than an HTML `<img>` tag, so I just use the HTML tag. + +Markdown is something you can make your own, which is one of its great strengths. Don't like the inline link syntax? Use the reference syntax, or, just write your links in HTML. Markdown is very flexible. Perhaps too flexible as it turns out. + +Markdown was not the first text-to-HTML converter, but it was simple and took most of its shorthand syntax from the real world. It mimicked informal styles that emerged when people tried to overcome the limitations of plain text -- writing styles that grew into conventions in email, IRC and Usenet. + +For example, in Markdown if you surround a word with asterisks it is rendered in HTML as `<em>` tags, which means it's (usually) italicized. Surround a word with double asterisks and it gets wrapped in `<strong>` tags and displayed in a bold font. + +Dig through old mailing lists, IRC logs or Usenet postings and you'll find this style of writing everywhere. Markdown might have been formalized and the parser written by Gruber and Swartz, but much of its language evolved collectively and informally over many years of countless people figuring out how to convey meaning effectively in plain text. + +Markdown turned out to be wildly successful, particularly among writers who used text editors rather than word processors and were devoted to the idea that your documents, no matter where they end up, should begin life as a text file. In other words, programmers. + +In the last ten years Markdown has been forked many times, ported to more than a dozen programming languages and rolled out on some big, often developer oriented websites, for example, Github and Stack Overflow. Markdown isn't just popular with developers though, there are also plugins for every major blogging platform, including Wordpress.com. + +All that is nice for those of us who grew to depend on Markdown since it means that we can use the familiar syntax all over the web. + +## What is the What + +The problem with Markdown is that it isn't entirely clear all the time. There are bugs, but worse there are ambiguities and edge cases where it's unclear what should happen. Consider Markdown's list syntax. To create an unordered HTML list in Markdown you write something like this: + +~~~ +* item one +* item two +* item three +~~~ + +Markdown then turns that into this HTML: + +~~~ +<ul> + <li>item one</li> + <li>item one</li> + <li>item one</li> +</ul> +~~~ + +So far so good. But remember when I said Markdown automatically wraps paragraphs in HTML `<p>` tags? Okay so what happens if we do this: + +~~~ +Here's a list of stuff: +* item one +* item two +* item three +~~~ + +There's no line break before it, but any human reader familiar with Markdown would look at this and know there's supposed to be a list there. That means the parser should close the paragraph tag and start creating a list. Or at least that's one way to look at it. The parser might also think, well, there's no line break so it's still part of the paragraph, but there is an asterisk around "item one" and "item two" so those should be wrapped in `<em>` tags. + +In fact, depending on which fork of Markdown you use there are [15 possible ways][1] this snippet of Markdown might be rendered. + +This is not an isolated example either, there are quite a few cases where Markdown is ambiguous. To be clear, there is no real "right" answer, someone just needs to make a decision about which of those 15 possibilities is "right". + +There are also plain old bugs in Markdown as well. That's why when authors port Markdown to other language they end up creating something slightly different and you end up with something that can be rendered 15 different ways. + +That's not just annoying for programmers trying to roll Markdown into their projects, it's a huge problem for web writers like you and me who never really know what's going to happen when we put some Markdown in a text field. + +In a perfect world Gruber would release an update for Markdown. Perhaps even Markdown 2.0. He might, as Dave Winer has suggested, also move Markdown to some sort of version control system and publicly host the code in such a way that other developers can contribute and improve the code. That is, after all, the point of a FOSS software license -- allowing others to freely use and modify the code. The easier you make it to contribute the more people that will do so. + +Regrettably, we don't live in that perfect world. Markdown, while widely adopted and widely used, hasn't seen so much as a bug fix since 2004. There's nothing wrong with that, it's certainly Gruber's right to let Markdown stand as is, but it's not surprising that other people want to fix the problems and make Markdown better. + +Recently a group of developers made an effort to do just that. They created a fork of Markdown that solves the inconsistencies and edge cases, fixes the bugs. They also offered up two reference implementations, plenty of documentation and hosted the project on GitHub, which, although not ideal, at least makes it easier for other developers to contribute. + +This fork might even be able to solve the ambiguities discussed earlier -- by consensus even. For example, it solves the earlier is-it-a-list or not dilemma by requiring blank lines before lists, a decision made in large part because that's what the majority of existing parsers do and therefore will be what most users will expect. + +That all sounds really nice, so why did the project rankle so many developers? Two reasons. First there was the name -- Standard Markdown. + +Were the project not using the Markdown name and simply positioning itself as an entirely new thing it would quite possibly have been welcomed by the entire Markdown community. But names have power, names give control. When you use a name you're telling the world you don't want to just improve a thing, you want to control it. Standard Markdown very much wants to be the future of Markdown. + +Gruber, understandably, did not like the name. He asked the developers to change it and they did. Standard Markdown became [CommonMark][6]. That was pretty much the end of the name controversy (though CommonMark could really use a new logo to further distance itself from Markdown). + +The far more interesting reason Standard Markdown, now CommonMark, created such a fuss is because of who was behind it. Not the individuals, but the companies they represent. + +## The Once and Future Web of People + +Exploitation of the user is the dominant business model on today's web. Whether that's in the form of data being gathered about you, onerous terms of service you need to abide by or privacy policies that treat you like a commodity, it's hard not to feel like everything is designed to turn you into a device for making someone else massive amounts of money. + +Today's web is short on humanity and that's something we need to fix. The problem is deep and systemic. Fixing it will not happen overnight, and may well not happen at all. + +In the mean time there seems to be a deep sense among developers that what we don't need is more big companies trying to take over small projects like Markdown. + +Despite the disappointing state of the web these days there remain pockets of the internet that still feel untainted. We jealously guard these spaces, our personal little Fugazis of the web that we can point to and say, see, Pinboard.in isn't taking venture capital, Metafilter isn't manipulating me for an exit, and Markdown is still a little script some guy wrote. + +CommonMark on the other hand was announced by Jeff Atwood, creator of Stack Overflow. Its contributors include developers from Github and Reddit. It's unclear to what extent the companies these people represent are involved, but it certainly appears that CommonMark is a project coming out of the very big companies many have learned to distrust. + +One of the common arguments leveled at Gruber when he objected to the name Standard Markdown was that there are dozens of other projects using the name Markdown that he did not, publicly anyway, object to, why this one. That is to say, why the apparent hypocrisy? + +Gruber initially agreed to talk to me about this story, but then never responded to my questions so I can't answer that with his words. But John MacFarlane, creator of the tool Pandoc, and the only CommonMark contributor not associated with a Big Internet Corp., told me that he first posted the spec to the Markdown mailing list in August, several weeks before making it more widely known. He used the name Standard Markdown and Gruber did not raise any objections at the time. + +It was only later, when Atwood announced the project and presented it as an effort backed by some of the biggest industry users of Markdown that Gruber protested the use of the name. + +Gruber was not alone. Plenty of developers balked, ostensibly at the name, but more likely at the name combined with the backers. Developer Dave Winer captured the sentiment nicely when he [wrote][2], "we all use Markdown, not just you and your pals. It isn't yours to do with as you please." + +## It Is Yours to Do With As You Please + +Winer is right in one sense, Markdown belongs to everyone who uses it. In a way this true precisely because Markdown's license says that anyone may do with it as they please. *So long as they don't use the name Markdown*. + +And doing as you please includes forking the project to move in a different direction. In fact, forking <em>is</em> open source. Names are something else though. + +When Oracle purchased Sun a group of developers concerned about the future of the MySQL project under Oracle's leadership forked the code and started a new project. They did not call it Standard MySQL though. If they had the Maria DB project most likely would have disappeared under an avalanche of trademark infringement lawsuits. Luckily the MariaDB developers did the right thing from the start, they made a fresh start, renaming the project. + +The open source world abounds with successful forks. LibreOffice supplanted OpenOffice, Blink is on its way to being used by more projects than WebKit, and WebKit itself completely overshadows the KHTML project. While not all forks are successful, only about <a href="http://thesai.org/Downloads/Volume3No2/Paper%2021%20-%20Forks%20impacts%20and%20motivations%20in%20free%20and%20open%20source%20projects.pdf">12 percent of them</a> devolve into trademark fights. + +Markdown and CommonMark are slightly different since technically CommonMark did not fork the Markdown code, but the Markdown syntax, which is much murkier legal territory. Whether or not Markdown's copyright notice (which applies to derivative works) legally applies to CommonMark is something a judge would have to decided. But legal or no, the name "Standard Markdown" certainly violates the spirit and historical precedence of forking a project. + +Changing the name to CommonMark solves the technical problem then, but it allows the bigger problem to go unanswered -- who should be allowed to control our tools? In other words, should you go with Markdown or CommonMark? + +The free software movement -- from which the license governing Markdown is derived -- says the answer is any one, or rather anyone who can write code. The license allows anyone to fork and build their own, it's all decentralized and open. Except that as Markdown illustrates, the result of that is not always ideal. + +MacFarlane [likens][3] the current state of Markdown to an untended garden, adding that "it is a predictable result that a garden so tended will become untidy, that people will begin to trip on the weeds, and that there will be a call for a cleanup." In other words, people want to know that their list will be a list. + +The answer to the question of who controls our tools will come in part from us and in part from the services we choose to use. There may be some benefits to CommonMark for developers and for web writers, but it still has to gain acceptance in the wider world if it has any hope of success. + +As Winer [writes][5], "Programmers always underestimate deployment, and think they can wave a magic wand and get everyone to upgrade. It's actually nothing like that. Once the investment is made, and years have gone by, no one wants to go back and dig out old infrastructure and replace it with something else." + +Stack Overflow, Reddit and GitHub will presumably be moving to CommonMark, which will make it the more familiar version for many users, but unless the CommonMark developers can bring others over to their cause, CommonMark will remain [Yet Another Standard][4]. + +In the mean time, CommonMark is very much a work in progress. If you have ideas or want to contribute to the project, head on over to [GitHub][7]. It might not be Markdown, but it could end up becoming something better. + + +[1]: http://johnmacfarlane.net/babelmark2/?text=Here%27s+a+list+of+stuff%3A%0A*+item+one%0A*+item+two%0A*+item+three%0A +[2]: http://discourse.codinghorror.com/t/standard-flavored-markdown/2382/19 +[3]: http://talk.commonmark.org/t/please-lets-tone-down-the-rhetoric/707/2 +[4]: https://xkcd.com/927/ +[5]: http://scripting.com/2014/09/08/soImSidingWithGruber.html +[6]: http://commonmark.org/ +[7]: https://github.com/jgm/stmd |