MeatballWiki

WikixParser

A generic WikiSyntax parser that is configurable at a data level. Ultimately allows translation from one syntax to another, and more importantly, a DocumentObjectModel that would make it more feasible to build a WysiwygWiki.

The primary goals of the Wikix parser are to

1. Describe WikiSyntaxes using a common WikixStyleSheet language 2. Emit valid XHTML

The secondary goals of the Wikix parser are to

1. Parse WikiSyntax into a DocumentObjectModel that can be re-emitted based on another Wikix stylesheet, thus allowing WikiSyntax-to-WikiSyntax translation on the fly; and also optionality for editors to pick a WikiSyntax preferable to them.

2. Store documents in a DocumentObjectModel format that could lend itself to a WysiwygWiki editor. The biggest benefit and problem of WikiSyntax is that it is enmeshed directly within the text, so it is both easy to create by keyboard and difficult to manage by GraphicalUserInterface.

In this way, the Wikix parser may be an important step towards migration towards a RichTextEditor.

Initially created by SunirShah for BibWiki and abandoned; may see the light of day while reviving MeatballWiki. Code will be appropriately OpenSource licensed (GPLv3).


Core concepts

The Wikix parser is based on understanding the most common patterns in the design of WikiSyntax, while still constraining the behaviour to be rationalized and consistent.

The core design goals are

  • Separate specification of the WikiSyntax from the implementation; thus
    • Flexibly add and change WikiSyntax without rewriting entire engines, and ideally dynamically user-configurable
    • Allow XHTML->wiki syntax translation
  • Emit valid XHTML

The system should be a blackbox. It takes

  • Input: a JSON WikixSheet that specifies the syntax rules for the engine
  • Input: A function to determine if a page exists or not
  • (optional) Input: an InterWiki IntermapTxt file
  • Input: The text to transform
  • Output: the transformed XHTML

Overall the parser algorithm is a RecursiveDescentParser, on a stream of lines. Syntax rules are arranged in a hierarchy (technically a DirectedAcyclicGraph). Each syntax rule captures a portion of the input, and then recursively runs its childrens' syntax rules against that captured input until the entire text is transformed.

Syntax rules are grouped by type. Almost all known WikiSyntax rules belong to one of these types. These types are designed to operate within a stream of text lines. This allows individual syntax rules to focus on what they look like, rather than to be designed to handle such problems as end-of-lines, or line wrapping, or inline modification of the input stream.

  • Block
  • Cell
  • Inline
  • Line
  • Link
  • Multiline
  • Paragraph
  • Root

Because the input is modified in place, to avoid collisions between recursive rules matching on previous output, the system allows rules to move a portion of the emitted output stream into a store which will later be restored before the final output is returned.

Along the way, the system will also collect links if requested.


Constraints

Compiler rules

  • Only paragraphs and lines can contain inline_styles
  • Children are listed in descending order of priority
  • Multilines can only have starts, not equals, ends, or optionallyEnds
  • Blocks MUST have starts AND ends; never equals or optionallyEnds
  • Compiler generates regexy things for starts and ends
  • links require href and text
  • equals cannot have children
  • links cannot have children


Issues with Python code ==

  • GPLed for now. Could be changed to MIT
  • The parser now handles almost all of MeatballWiki's TextFormattingRules
    • except ISBN links
    • except unicode in the WikiLinks. Will likely need to switch to PyPi regex module
  • requirements.txt is a trainwreck
  • code needs to be SelfDocumentingCode
  • explanation of the model
  • XHTMl -> Wiki transformer needs to be ported and upgraded
  • I dislike the regex hack for CamelCase""s. It would be better to have a more BNF-like rule set. I suppose one could match CamelCase""\S+ with children CamelCase -> link, "" -> ''
  • The introduction of collections.deque because Python lists are non-shiftable is a bit of a hack; could be cleaner
  • The String(str) class is totally the wrong architecture
  • No table of contents (<toc> + == # heading ==)
  • No numbered bracketed links like this https://www.appbind.com
  • Many of these require some kind of lambdas to execute based on the syntax rule

BarnRaising request

I would greatly appreciate if someone with fresh eyes compared http://meatballsociety.org/wikix/TextFormattingRules.html and TextFormattingRules and identified any differences to suss out any bugs.

I know about the one with mixed lists; MeatballWiki is actually incorrect and I don't consider this normal behaviour i.e.

  • Bulleted
    1. Numbered
Term
definition

Edit this page | History