Parsewiki's manual

Jaime E. Villate
University of Porto
villate@gnu.org

version 0.5

June 8, 2002

Parsewiki is a program that can transform a text file with a very minimal Wiki style syntax into various other formats, including HTML, XHTML, Docbook and LaTeX. This manual is also a good example of the type of syntax understood by parsewiki.


Copyright (C) 2002 Jaime E. Villate. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; without any Invariant Sections. A copy of the license is included in the file GFDL.

Introduction

In this manual we introduce a method to create documents, aimed at ease of writing for the author and keeping the content to the minimum required without mixing it with commands for presentation of the text. You just have to know a few basic rules to begin writing documents that can be then transformed into various documentation formats with the parsewiki program.

The main advantage of wiki systems is that a final version of the text can be easily written down, almost as simply as we write an e-mail message, with a minimum of formatting tags. The source document can be easily read or sent by e-mail without scaring people away with strange tags. Another big advantage of this system is that no matter what the author writes in the source file, there will always be an output file when processed by parsewiki. There won't be any syntax errors that prevents the output file to be formed; what might happen is that the result is not what we expected, but there will be no formating errors. This way of writing documents has proven to be very useful in websites created by public contributions, such as the Wikipedia encyclopedia.

To begin learning this system, it is recommended that you process this manual's source file (manual-en.txt) with parsewiki to produce an HTML version:

   parsewiki manual-en.txt > manual-en.html

You should then read the HTML version side by side with the source file, to understand how the formatting rules work. Those who know how to get a printable version from a Docbook/XML or LaTeX file should also use the following commands to create Docbook/XML or LaTeX versions of the manual

   parsewiki -f docbook manual-en.txt > manual-en.xml
   parsewiki -f latex   manual-en.txt > manual-en.tex

Basic rules

The first rule you must remember is that the text in each line has to start at the first column. If you leave any space at the beginning of the line, the whole line will be interpreted as a program listing which will be presented verbatim in the output file. The lines in this paragraph I am writing have no space at the beginning, with the result that they are appended together to form a nicely formatted paragraph; we would not want the same thing to happen when we present a computer program, in which case we should add space at the beginning of each line. Consider for instance the following subroutine which makes part of parsewiki:

 sub WikiHeading
 {
   my ($depth, $text) = @_;
   $depth = length($depth);
   $depth = 5  if ($depth > 5);
   return $OpenItem{'h'.$depth} . $text . $CloseItem{'h'.$depth} . "\n";
 }

To mark the end of a paragraph, you should leave at least one empty line. The title of a section should be written in between two = symbols, without any space at the beginning of the line, but there should be space between the equal signs and the text; for example:

    = Section 1 =

Subsections of various levels are obtained by increasing the number of equal signs on each side, as in:

    === A third level section ===

Lists

There are three types of lists: ordered lists, unnumbered lists and descriptive lists (glossaries). Each item in a list must go into a single line and there should not be any empty lines in between two items of the same list, unless one of them is the last item and the other one is the first item in a new list. When the contents of an item is too long, you can use the line continuation character (backslash) to split it into shorter lines.

Unnumbered lists

Each item in an unnumbered list must begin by an asterisk (in the first column always); consider for instance:

This sentence is no longer part of the list and it marks the beginning of a new paragraph, even though no (optional) empty lines were left on top of it in the source file.

Ordered Lists

  1. They work in a similar way to the previous ones.

  2. The difference is that you must use # rather than *.

  3. We will cover nested lists later on.

Descriptive lists

These are lists of terms, followed by their descriptions, as in a dictionary or glossary. Each item starts with a semicolon (;) in the first column, followed by the term, followed by colon (:) and finally the description, all in a single line. For instance:

HTML

Used to publish contents on the web or to view it locally with a web browser.

XHTML

The language proposed as a future replacement for HTML, with all the advantages associated to XML.

DocBook

A type of XML documents very much in fashion in the editorial world.

LaTeX

Without a doubt, the best format to produce scientific texts of high quality.

Nested lists

To include a list inside another one, we must increase the level of the list (or lists) which should be placed inside another; for instance:

  1. This is a fist level list

  2. Within this second item there is a second level list:

  3. We continue here with our initial list. If we wanted this item to start a new list, we would have left a blank line.

  1. For instance, we just started here a new list.

  2. Which will include a descriptive list:

    option -t

    A file containing a template to be used instead of the default one.

    option -f

    Output format. It can be:

    • html

    • xhtml

    • docbook

    • latex

  3. The list ends here.

Hyperlinks

Any time we write a URL, for example http://www.usemod.com/cgi-bin/wiki.pl, it will be recognized by parsewiki and a link to the URL will appear in the output format. If you want to associate some text to the link, you should write the URL, followed by the text, inside square brackets and without leaving any spaces after [. Example: I am a proud member of The GNU Project.

Internal links for which a URL without starting http:// is given, will have to be inside double brackets, [[ and ]], to be recognized as such. For instance, if the source file for this manual is in the same directory where the HTML version has been generated, the HTML page will have a link to the source file here: manual.txt. Or if you want to use some text: Source file for this manual.

Figures

If a URL ends with a name with an extension recognized as a graphic file viewable by web browsers, the URL will be replaced by the image (when the selected output file format is HTML or XHTML). For example http://savannah.gnu.org/icons/back.png

If you want the figure to be detached from the text, you should place it into a separate paragraph:

http://savannah.gnu.org/images/floating.jpg

(This figure will only appear in the HTML and XHTML versions, because the other formats do not support the display of figures on the web). If we associate some text to the URL of the figure, rather than being displayed on the page, a hyperlink to it will be given:

GNU's Logo

If the figure is inside a local directory, its complete path and filename should be given inside [[ and ]]. It is necessary that the filename end with one of the extensions recognized by parsewiki

  jpg jpeg png bmp gif

(which can go in upper case too). If that's not the case, it will be necessary to create a version in one of those formats. In the case of a LaTeX file obtained with parsewiki, even if we use one of those formats, it is a file with the same name but with extension .ps or .eps, will be required by dvips; if pdflatex is used instead, it will expect to find files ending on .jpg, .jpeg, or pdf.

This manual comes with a vector figure barra.ps which comes in PNG format too (barra.png). It can be seen in all the output formats if we use:

barra.png

The PostScript version produced by latex and dvips will use the file barra.ps. The PDF version obtained with pdflatex will use the file barra.png, since the file barra.pdf does not exist.

Other fonts

You can get italics by using two consecutive apostrophes, or using the HTML tag <em>: like this or <em>like this</em>. Bold face is obtained with three apostrophes or the <strong> tag: 3 apostrophes or <strong>strong</strong>. To obtain fixed-space font, as in a typewriter, you should use two consecutive commas, or the tag <tt>; for instance "ls --color". In the three cases the text in a different font should be inside a single line; looking at the source file of this manual, you will see that I have just used that feature to typeset the tag <em> without a companion </em> (I've done it again :-). If the text is too long, use backslashes to split it.

Meta information

Some optional information about the document can be included within the front matter, by using the syntax: {name: contents}. If name is one of the following:

  title author date organization address version abstract copyright

the companion contents will be displayed in the document's front matter. You can use any other name not included in the list, but its content will be silently ignored, unless you redefine the output template used. All meta information should be together, at the beginning of the document, and with no spaces at the beginning of the lines. Otherwise it might be interpreted as normal text.

Templates

In any of the four output formats that parsewiki can produce, the output file is created from default templates. In the directory templates distributed with this package you can find copies of the default templates. You can use them as a model to create custom templates. For example, if we have created a custom template in ~/template.tex, it can be used giving a -t option to parsewiki: parsewiki -f latex -t ~/template.tex file.txt > file.tex

Conclusions

The simple system we have introduced in this manual allows us to create simple documents in an easy and fast way. Due to its simplicity, you should not expected to work for more complex documents; however, this method can be used to create an initial version which can then be further developed working on the LaTeX or DocBook file created with this method.

This is a preliminary beta release, and probably filled with bugs. Among the future development plans are the implementation of tables, bibliographies and figures captions.