← MiniMark  ·  2026-03-31  ·  raw


MiniMark Specification 2026-03-31

Introduction

MiniMark is an opinionated, small subset of [CommonMark 0.31.2] syntax, targeted primarily at agentic use cases where AI agents are the leading producers and consumers of CommonMark documents.

The minimalistic design aims to eliminate optionality, ambiguity, and complexity from the syntax and the resulting document structure helping AI agents to correctly generate and interpret CommonMark documents.

This specification is meant to be self-contained and hence reproduces certain definitions originating from [CommonMark 0.31.2] or [Unicode 17.0.0] where needed.

Definitions

MiniMark selects and thus limits

  • the structural elements (blocks, inlines),
  • the syntax for describing these elements,
  • the character encodings

available to an author or editor of a CommonMark document that complies with this specification.

Characters

The Unicode codespace is a sequence of integers in the range from 0 to 1,114,111.

A Unicode code point is an integer from the Unicode codespace, denoted as a hexadecimal number with at least four digits and prefixed with “U+” (i.e. U+0030).

A Unicode scalar value is any value in the Unicode codespace (U+0000 through U+10FFFF) excluding the surrogate range (U+D800 through U+DFFF).

Character is a synonym for a Unicode scalar value.

The space character is U+0020.

The tab character is U+0009.

The hyphen character is U+002D.

The low line (also called underscore) character is U+005F.

The period character is U+002E.

The exclamation mark character is U+0021.

The asterisk character is U+002A.

The backtick character is U+0060.

The tilde character is U+007E.

The left parenthesis character is U+0028.

The right parenthesis character is U+0029.

The less-than sign (also called opening angle bracket) character is U+003C.

The greater-than sign (also called closing angle bracket) character is U+003E.

The line feed character is U+000A.

The carriage return character is U+000D.

The Uppercase Latin alphabet are the characters A through Z (U+0041 through U+005A).

The Lowercase Latin alphabet are the characters a through z (U+0061 through U+007A).

The Latin alphabet is the union of the uppercase Latin alphabet and the lowercase Latin alphabet.

The ASCII digits are the characters 0 through 9 (U+0030 through U+0039).

The ASCII uppercase alphanumerical characters are the union of the uppercase Latin alphabet and the ASCII digits.

The ASCII lowercase alphanumerical characters are the union of the lowercase Latin alphabet and the ASCII digits.

The ASCII alphanumerical characters are the union of the Latin alphabet and the ASCII digits.

A character encoding is a mapping from characters to bit sequences.

Lines

A line ending is a line feed, a carriage return not followed by a line feed, or a carriage return and a following line feed.

EOF (end-of-file) is the condition that exists when no next character of a character sequence can been consumed.

A line in a sequence of characters is a subsequence of zero or more characters other than line feed or carriage return, which

  • either starts at the beginning of the character sequence or is preceded by a line ending, and
  • is either followed by a line ending or by EOF.

Hence a sequence of characters can be uniquely partitioned into an alternating sequence of lines and line endings such that every character of the sequence is contained exactly in either one line or one line ending.

Hence every sequence of characters contains at least one line.

A blank line is a line containing either no characters, or only spaces or tabs.

Documents

A CommonMark document is a sequence of characters.

MiniMark Document

A MiniMark document is a CommonMark document which meets all of the following criteria.

Encoding

The document is encoded in UTF-8.

Tabs

The document does not contain any tabs characters.

Line Endings

The document must use one form of line ending consistently throughout.

Block Elements

After parsing, the document’s contained block elements and the markdown syntax used to indicate each contained block element meet all the following criteria.

Thematic breaks (aka horizontal rules)

A thematic break is indicated by a line consisting of exactly three hyphens.

Clarifications & Explanations

No other characters are permitted on the line, not before (indentations), after, or in-between the hyphens.

A --- line must not immediately follow a paragraph line since this would be interpreted as a Setext heading which is not permitted in MiniMark. In such a case insert a blank line before the --- line.

Examples
---

ATX headings (aka headings)

An ATX heading is indicated by a single line beginning with 1 to 6 # characters, followed by exactly one space, followed by at least one non-space character such that the last character is not space.

The ATX heading’s inline content does not parse to the empty string.

Clarifications & Explanations

No indentation is permitted, the # sequence must start at the beginning of the line.

No closing sequence of any number of unescaped # characters is permitted.

Examples
# Heading level 1
## Heading level 2
### Heading level 3
#### Heading level 4
##### Heading level 5
###### Heading level 6

Setext headings

The document does not contain any setext headings.

Indented code blocks

The document does not contain any indented code blocks.

Fenced Code Blocks

A fenced code block is indicated by an opening fence line, zero or more content lines, and a closing fence line, where these terms are defined as follows.

A code fence is the character sequence of 3 backtick characters.

An opening fence line is a line consisting of a code fence followed by a space followed by a non-empty sequence of ASCII lowercase alphanumerical characters (called the language identifier).

A closing fence line is a line consisting exactly of a code fence.

The language identifier is an advisory tag for syntax highlighting and semantic identification. If no matching language identifier can be found (e.g. because the content is not source code in a specific language) , use plaintext.

Exception

In the rare case that a content line equals a closing fence line (like in the example fenced code block below), the code fence is redefined to be the character sequence of 3 tilde characters. This redefinition applies only to affected fenced code blocks.

Clarifications & Explanations

The exception was added to allow MiniMark-compliant fenced code blocks to contain fenced code block markup (as required by this specification).

Examples
``` python
def greet(name):
    return "Hello, " + name
```
Example language identifiers:
bash         shell commands and scripts
html         HTML markup
java         Java source code
javascript   JavaScript source code
json         JSON data
markdown     Markdown syntax (CommonMark or variants)
python       Python source code
sql          SQL queries
typescript   TypeScript source code
xml          XML markup
yaml         YAML data

HTML blocks

The document does not contain any HTML blocks.

The document does not contain any link reference definitions.

Paragraphs

A paragraph consists of a single line only.

A paragraph line starts with a non-space character and ends with a non-space character.

Clarifications & Explanations

Two paragraphs are separated by exactly one blank line (not more).

Examples
This is a paragraph.

This is another paragraph.

Blank lines

The document does not start or end with a blank line.

There is at most one blank line between two consecutive block elements.

Block quotes

The document does not contain any block quotes.

List items

The bullet list marker is the asterisk character.

The list marker of a list item is followed by exactly one space character (not more).

The start number for an ordered list item is 1.

The list marker of a list item which is not nested in another list item has no preceding space characters (called a level 1 list item).

The ordered list marker of a level 1 list item uses the period character as delimiter.

The list marker of a list item which is nested in a level 1 list item has the minimum number of preceding space characters to make it a nested block element (called a level 2 list item).

The ordered list marker of a level 2 list item uses the right parenthesis character as delimiter

A list item contains only some or all of the following block elements:

  • Fenced code block
  • Paragraph
  • Blank line
  • Level 2 list item (if and only if the containing item is a level 1 list item)

A list item is not empty.

A list item does not start with a blank line.

A level 1 list item does not start with a level 2 list item.

Lists

There are no blank lines between list items.

For an ordered list the numbers of consecutive list markers must be consecutive positive integers starting at 1 and incrementing by 1.

Clarifications and Explanations

MiniMark only permits top-level lists with one level of list nesting.

Both ordered and unordered lists can be nested inside either an ordered or unordered list. All four combinations are permitted.

Examples
Unordered list
* First item
* Second item
* Third item
Ordered list
1. First item
2. Second item
3. Third item
Unordered list with nested unordered list
* First item
* Second item
  * Nested item A
  * Nested item B
* Third item
Unordered list with nested ordered list
* First item
* Second item
  1) Nested item A
  2) Nested item B
* Third item
Ordered list with nested unordered list
1. First item
2. Second item
   * Nested item A
   * Nested item B
3. Third item
Ordered list with nested ordered list
1. First item
2. Second item
   1) Nested item A
   2) Nested item B
3. Third item

Inlines

After parsing, the document’s inline content and the markdown syntax used to indicate each contained inline content meet all the following criteria.

Code spans

A code span begins and ends with a single backtick character.

A code span does not contain line endings.

Examples
`code span`

Emphasis and strong emphasis

Emphasis begins and ends with a single asterisk.

Strong emphasis begins and ends with a double asterisk.

Emphasis or strong emphasis character sequences are not nested and do not overlap.

Examples
*Emphasis aka italic*

**Strong emphasis aka bold**

An inline link

  1. has a link text which does not parse to the empty string,
  2. has a link destination which is not enclosed in angle brackets, starts and ends on a non-space character, and conforms to the URI syntax,
  3. does not have a link title.

The document does not contain reference links.

Clarifications and Explanations

Neither link references nor reference links are permitted in MiniMark documents.

Examples

Inline link

[link](uri)

Images

An image is indicated by an exclamation mark followed by an inline link syntax as defined above.

Examples

Inline link

![image](uri)

Textual content

There are no criteria for textual content (i.e. all textual content is permitted).