# Preliminaries

# 1. Characters and lines

Any sequence of characters (opens new window) is a valid CommonMark document.

character (opens new window) is a Unicode code point. Although some code points (for example, combining accents) do not correspond to characters in an intuitive sense, all code points count as characters for purposes of this spec.

This spec does not specify an encoding; it thinks of lines as composed of characters (opens new window) rather than bytes. A conforming parser may be limited to a certain encoding.

line (opens new window) is a sequence of zero or more characters (opens new window) other than newline (U+000A) or carriage return (U+000D), followed by a line ending (opens new window) or by the end of file.

line ending (opens new window) is a newline (U+000A), a carriage return (U+000D) not followed by a newline, or a carriage return and a following newline.

A line containing no characters, or a line containing only spaces (U+0020) or tabs (U+0009), is called a blank line (opens new window).

The following definitions of character classes will be used in this spec:
whitespace character (opens new window) is a space (U+0020), tab (U+0009), newline (U+000A), line tabulation (U+000B), form feed (U+000C), or carriage return (U+000D).

Whitespace (opens new window) is a sequence of one or more whitespace characters (opens new window).

Unicode whitespace character (opens new window) is any code point in the Unicode Zs general category, or a tab (U+0009), carriage return (U+000D), newline (U+000A), or form feed (U+000C).

Unicode whitespace (opens new window) is a sequence of one or more Unicode whitespace characters (opens new window).

space (opens new window) is U+0020.

non-whitespace character (opens new window) is any character that is not a whitespace character (opens new window).

An ASCII punctuation character (opens new window) is !, ", #, $, %, &, ', (, ),*, +, ,, -, ., / (U+0021–2F), :, ;, <, =, >, ?, @ (U+003A–0040), [, \, ], ^, _, ` (U+005B–0060), {, |, }, or ~ (U+007B–007E).

punctuation character (opens new window) is an ASCII punctuation character (opens new window) or anything in the general Unicode categoriesPcPdPePfPiPo, or Ps.

# 2. Tabs

Tabs in lines are not expanded to spaces (opens new window). However, in contexts where whitespace helps to define block structure, tabs behave as if they were replaced by spaces with a tab stop of 4 characters.

Thus, for example, a tab can be used instead of four spaces in an indented code block. (Note, however, that internal tabs are passed through as literal tabs, not expanded to spaces.)

Example 1

Markdown HTML Demo
→foo→baz→→bim

<pre><code>foo→baz→→bim
</code></pre>

Example 2

Markdown HTML Demo
  →foo→baz→→bim

<pre><code>foo→baz→→bim
</code></pre>

Example 3

Markdown HTML Demo
    a→a
    ὐ→a

<pre><code>a→a
ὐ→a
</code></pre>

In the following example, a continuation paragraph of a list item is indented with a tab; this has exactly the same effect as indentation with four spaces would:

Example 4

Markdown HTML Demo
  - foo

→bar

<ul>
<li>
<p>foo</p>
<p>bar</p>
</li>
</ul>

Example 5

Markdown HTML Demo
- foo

→→bar

<ul>
<li>
<p>foo</p>
<pre><code>  bar
</code></pre>
</li>
</ul>

Normally the > that begins a block quote may be followed optionally by a space, which is not considered part of the content. In the following case > is followed by a tab, which is treated as if it were expanded into three spaces. Since one of these spaces is considered part of the delimiter, foo is considered to be indented six spaces inside the block quote context, so we get an indented code block starting with two spaces.

Example 6

Markdown HTML Demo
>→→foo

<blockquote>
<pre><code>  foo
</code></pre>
</blockquote>

Example 7

Markdown HTML Demo
-→→foo

<ul>
<li>
<pre><code>  foo
</code></pre>
</li>
</ul>

Example 8

Markdown HTML Demo
    foo
→bar

<pre><code>foo
bar
</code></pre>

Example 9

Markdown HTML Demo
 - foo
   - bar
→ - baz

<ul>
<li>foo
<ul>
<li>bar
<ul>
<li>baz</li>
</ul>
</li>
</ul>
</li>
</ul>

Example 10

Markdown HTML Demo
#→Foo

<h1>Foo</h1>

Example 11

Markdown HTML Demo
*→*→*→

<hr />

# 3. Insecure characters

For security reasons, the Unicode character U+0000 must be replaced with the REPLACEMENT CHARACTER (U+FFFD).