WikixParser

A generic WikiSyntax parser that is configurable at a data level. Ultimately allows translation from one syntax to another, and more importantly, a DocumentObjectModel that would make it more feasible to build a WysiwygWiki.

The primary goals of the Wikix parser are to

1. Describe WikiSyntaxes using a common WikixStyleSheet language 2. Emit valid XHTML

The secondary goals of the Wikix parser are to

1. Parse WikiSyntax into a DocumentObjectModel that can be re-emitted based on another Wikix stylesheet, thus allowing WikiSyntax-to-WikiSyntax translation on the fly; and also optionality for editors to pick a WikiSyntax preferable to them.

2. Store documents in a DocumentObjectModel format that could lend itself to a WysiwygWiki editor. The biggest benefit and problem of WikiSyntax is that it is enmeshed directly within the text, so it is both easy to create by keyboard and difficult to manage by GraphicalUserInterface.

In this way, the Wikix parser may be an important step towards migration towards a RichTextEditor.

Initially created by SunirShah for BibWiki and abandoned; may see the light of day while reviving MeatballWiki. Code will be appropriately OpenSource licensed (GPLv3).

Core concepts

The Wikix parser is based on understanding the most common patterns in the design of WikiSyntax, while still constraining the behaviour to be rationalized and consistent.

The core design goals are

Separate specification of the WikiSyntax from the implementation; thus
- Flexibly add and change WikiSyntax without rewriting entire engines, and ideally dynamically user-configurable
- Allow XHTML->wiki syntax translation

Emit valid XHTML

Generate a path to a WysiwygWiki overlaid on existing WikiEngines.

The system should be a blackbox. It takes

Input: a JSON WikixSheet that specifies the syntax rules for the engine
Input: A function to determine if a page exists or not
(optional) Input: an InterWiki IntermapTxt file
Input: The text to transform

Output: the transformed XHTML

Overall the parser algorithm is a RecursiveDescentParser, on a stream of lines. Syntax rules are arranged in a hierarchy (technically a DirectedAcyclicGraph). Each syntax rule captures a portion of the input, and then recursively runs its childrens' syntax rules against that captured input until the entire text is transformed.

Syntax rules are grouped by type. Almost all known WikiSyntax rules belong to one of these types. These types are designed to operate within a stream of text lines. This allows individual syntax rules to focus on what they look like, rather than to be designed to handle such problems as end-of-lines, or line wrapping, or inline modification of the input stream.

Block
Cell
Inline
Line
Link
Multiline
Paragraph
Root

Because the input is modified in place, to avoid collisions between recursive rules matching on previous output, the system allows rules to move a portion of the emitted output stream into a store which will later be restored before the final output is returned.

Along the way, the system will also collect links if requested.

Constraints

Compiler rules

Only paragraphs and lines can contain inline_styles
Children are listed in descending order of priority
Multilines can only have starts, not equals, ends, or optionallyEnds
Blocks MUST have starts AND ends; never equals or optionallyEnds
Compiler generates regexy things for starts and ends
links require href and text
equals cannot have children
links cannot have children

Ruby code http://meatballsociety.org/wikix/syntax_new.rb
Javascript code and live demo http://meatballsociety.org/wikix/line.html
Python code https://github.com/sunir/wikix.py/tree/master

Issues with Python code ==

GPLed for now. Could be changed to MIT
The parser now handles almost all of MeatballWiki's TextFormattingRules
- except ISBN links
- except unicode in the WikiLinks. Will likely need to switch to PyPi regex module
requirements.txt is a trainwreck
code needs to be SelfDocumentingCode
explanation of the model
XHTMl -> Wiki transformer needs to be ported and upgraded
I dislike the regex hack for CamelCase""s. It would be better to have a more BNF-like rule set. I suppose one could match CamelCase""\S+ with children CamelCase -> link, "" -> ''
The introduction of collections.deque because Python lists are non-shiftable is a bit of a hack; could be cleaner
The String(str) class is totally the wrong architecture
No table of contents (<toc> + == # heading ==)
No numbered bracketed links like this https://www.appbind.com
Many of these require some kind of lambdas to execute based on the syntax rule

BarnRaising request

I would greatly appreciate if someone with fresh eyes compared http://meatballsociety.org/wikix/TextFormattingRules.html and TextFormattingRules and identified any differences to suss out any bugs.

I know about the one with mixed lists; MeatballWiki is actually incorrect and I don't consider this normal behaviour i.e.

Bulleted

1. Numbered

Term: definition