R2Net Custom Ref

About this Guide

The Logictran RTF Converter is a package that converts word processing documents to HTML and XML. This guide is a detailed treatment on customizing your output. It describes all of the translation files and how they can be modified for your conversion needs. A higher-level introduction to basic conversion options can be found in the Users Guide. The online version of this guide will always contain the most up-to-date documentation as well as information about available upgrades.

Translation files (.trn)

Translation files define the rules and markup for translation, and all end with the extension ".trn". All of the ".trn" files will have the same syntax, but each translation file plays a different role in the translation. The standard set of translation files are:


NAME
Description
html.trn
This translation file is used when converting documents to HTML or XHTML format.
docbook.trn
This translation file is used when converting documents to DocBook format.
base.trn
This translation file is used by all conversions. It contains strings and functions that are common across all output types.
trnflag.trn
This translation file is for user modifications to strings and functions, and it is used by all conversions. Any settings made here will override the default settings of the converter, as well as settings in html.trn, docbook.trn and base.trn. Making your modifications here allows you to make changes that affect all conversions. If you make a copy of this file and put it into a directory containing RTF files, that copy of trnflag.trn will be used instead of the one in the application directory. This allows you to create special settings for a collection of documents.
*.trn
You can create a translation file with the same name as your RTF document, but having the extension ".trn" and it will be used in addition to all of the other translation files. All settings made in this file will override settings of the previous three. This allows you to create specific settings for a particular RTF file. This file should be located in the same directory as the RTF source.
cmd.trn
This settings file is created by the Windows GUI in the application directory, and it contains the settings chosen in the GUI application. These settings override all of the previously described settings files. This settings file is used by all translations – even when you are using the command line, ActiveX or library interfaces. It is also used by the Word Addin, String2Net and Doc2Net applications. This allows you to make settings using the GUI and then run conversions using any of the other interfaces to the RTF converter.
All of the translation files are searched for by looking first in the directory containing the RTF input file. Then, in the directory named in the environment variable RTFLIBDIR, and finally in the directory containing the Logictran RTF Converter application.

Note that the command line interface and the Macintosh GUI allow strings to be defined on the command line. These definitions will override definitions made in all of the ".trn" files.

Translation files (.trn) Syntax

Translation files define all of the markup tags used by the filter, and all of the translation options. These are defined in a series of tables which will be described below. In addition, the translation files can contain processing directives that allow you to include settings from other files, and conditionally use or ignore settings.
A translation file can contain eight table types. They are labeled .Strings, .PTag, .TTag, .TMatch, .PMatch, .Colors, .Functions, and .Charsets. These tables begin with the name (in column one). When a table directive is encountered, all subsequent lines are processed as entries for that table. This process continues until a new table is encountered. Note that included files may have their own table directives, so it is recommended that you explicitly place a table directive after including a file. Multiple tables of the same type may appear in the same translation file.



The tables themselves are composed of records containing a fixed number of fields which are separated by commas. The fields are either strings (which should be quoted) integers or bitmasks.
A translation file is processed as follows:
Blank lines are ignored.
Lines beginning with a ‘# ’ in column 1 are either comments and ignored, or they are conditional processing directives or include directives.
A comment (beginning with ‘#’) may also appear to the right of the last field on a line.
Lines ending with a ‘\’ are joined to the following line (no whitespace is removed)
Where strings are required, either single or double quotes can surround the string. For example: ‘hello world’ and “hello world” have the same meaning.
To include a quote mark in a string, surround the string with the other form of quotes: “isn’t it grand” or ‘say “hello”’.
You may also escape the meaning of the quote with a backslash: ‘isn\’t it grand’ or “say \”hello\””
All of the directives listed below have case-insensitive names.

Include File Processing

A translation file can include directives from other files. This allows translation directives to be grouped for easier maintenance. This technique is used in html.trn to include 'base.trn', 'trnflag.trn', and 'cmd.trn'.
Syntax:
#include “filename”
or
#cinclude “filename”

where “filename” is a string containing a relative or absolute file name. The file is searched for by looking first in the directory containing the RTF input file. Then, in the directory named in the environment variable RTFLIBDIR, and finally in the directory containing the Logictran RTF Converter application. The ‘#cinclude’ directive will conditionally include a file - but not generate an error if the file does not exist. This can be used as:
#cinclude "$BaseName(InputFile).trn"
which will attempt to include a file with the same filename as the input file, and a “.trn” file extension. This could contain directives specific to a given input file.

Turning on the Logfile feature (by setting CreateLogFile to 1),allows you to track what files are being used and in what order. See Log and Error Files for more information.

Log and Error Files

By default, an error file is created if any warnings or errors are generated during the conversion of your documents. The error file is also where any debugging output is generated. The log file contains a record of files read and written by the converter, and is a useful diagnostic aid for conversions. There are several strings that you can set to control the error file processing as well as the log file. They are:
String
Default Value
Description
CreateLogFile 0
If set to 1, a log file is created, or appended to. The log file is an html format file containing a list of files read and written as part of the current translation process. By default, an existing file is appended to, leaving a running record of all files created during r2net translations.
NewLogFile 0
If a log file is created (see CreateLogFile), any existing log file is appended to by default. If NewLogFile is set to 1, any existing log file is deleted and a new one started.
LogFileName r2net_log.html
If a log file is written (see CreateLogFile), this filename will be used. Note, set LogFileName before CreateLogFile in the .trn file, otherwise thedefault r2net_log file will be used.
LogFileUsed
(read only)If a log file is written, LogFileUsed will contain the name of the logfile.
DisableErrorFile 0
If set to 1, no error file will be created.
ErrorFileName filename_err.html
If an error file is written as a result of the translation, it will be written to this file.



Conditional Processing

A translation file can contain directives that allow conditional processing. They are:
#ifdef varname
where varname can be the name of any variable. If it is defined (value=1) all of the subsequent lines are used, if not, there are ignored up until an #else or #endif
#else
The else directive reverses the previous ifdef, ifndef or ifmatch directive.
#ifndef varname
where varname can be the name of any variable. If it is NOT defined, all of the subsequent lines are used, if it is, there are ignored up until an #else or #endif
#ifmatch varname string
where varname is the name of any variable, and string is any string (enclosed in quotation marks)
#endif
This directive ends the conditional section

Conditional sections can be nested. For example:
#ifdef SomeFlag
#ifmatch SomeFlag “v1”
...xxx..
#else
...yyy...
#endif
#else
...zzz...
#endif

.TMatch Table

Generating character markup to handle characteristics like bold, italic, font size, or the character style name is a two part process. First the properties of each character are checked against the .TMatch table. The TMatch table entries are scanned for rules that match the properties of that character. For each match, the TMatch table links to an entry in the TTag table containing the markup to be used for that character. The TMatch rules can match on any combination of font face, font size, formatting (bold, italic, underline, ...), color, and the character style name.

A single character can have multiple matches in the .TMatch table. Each match will specify additional markup to be used.
The format is:
'Font',FontSize,MatchStyle,'color','CharStyle','TextStyleName'
Font
The font face name, like 'Courier', or 'New York', if left blank, any font face is a match
FontSize
The font size in points. If 0, any point size is a match
MatchStyle
The MatchStyle string contains any combination of these attributes names optionally preceded by 'Not': Bold, Italic, StrikeThrough, Outline, Shadow, SmallCaps, AllCaps, Hidden, Underline, WordUnderline, DottedUnderline, DoubleUnderline, SuperScript, SubScript
Color
The color name ('blue') or RGB Value (#RRGGBB). If left empty, any color is a match.
CharStyle
The character style name, like 'keyword' . If empty, then any character style is a match.
TextStyleName
The identifier corresponding to an entry in the TTag table that contains the markup to use for this match rule.

Examples:
'',0,'SuperScript','','','sup' '',0,'','','codesnip','tt' '',0,'Bold Not Italic','blue','','tt'

A plain superscripted character with no character style would only match the first rule.
A blue, superscripted, bold character with a character style of 'codesnip' would match all three rules. Note that rules 2 and 3 both specified 'tt' markup – the converter will only apply the markup once – no matter how many matches are found for it.

In addition to specifying markup to apply to characters, there are some special textStyleNames that can be used. They are:
_Discard - text will not be output
_Name - text will be used for an named anchor
_Link - text will be used as a link to a heading, http: or mailto:
_HRef - text is an href
_Hot - end of this text marks the end of hot text
_Literal - text is treated as literal HTML markup
_Capture - text is saved into $CaptureText; and then $_Capture; is expanded

.TTag Table

Each entry in the .TTag table describes an HTML text markup. The format is:
.TTag "name","starttag","endtag"
TextStyleName
A unique name for this entry. These names are referenced in the .TMatch table.
Starttag
This string will be output once at the beginning of any text for this markup.
Endtag
This string will be output once at the end of any text for this markup.

The start and end tags may be simple text, but they can also contain strings or functions.
Examples:
'b','<b>','</b>' 'i,'<i>','</i>' 'font','$CheckFont()','$EndFontT;'

The definitions for 'b' and 'i' are simple markup tags. The definition of 'font' calls a function $CheckFont() at the start of a run of characters, and expands the string 'EndFont' at the end.

Tags defined in the TTag table are optimized to eliminate extra markup. For example, rather than generating:
<b>H</b><b>e</b><b>l</b><b>l</b><b>o</b>
the markup is optimized to produce:
<b>Hello</b>.
Nested markup is handled as well, so if the 'H' in our last example was changed to italic as well as bold, and the 'l's were only italic, you would get:
<b><i>H</i>e</b><i>ll</i><b>o</b>.

.PMatch Table

Each paragraph style in your document can be mapped to a different HTML or XML markup. This is done in the PMatch table. The style name of each paragraph is looked for in the PMatch table. Corresponding to that entry is the identifier of an entry in the PTag table that contains the markup to be used for that paragraph. In addition, there is also a level number. A level number of 0 means that this paragraph style should not be nested in any lists. A level of 1 means that this paragraph style should be nested within a list. A level of 2, means this paragraph style should appear within two nested lists..etc.

The format of the PMatch table is:
.PMatch
"Paragraph Style",nesting_level,"PTagName"
Paragraph Style
Your paragraph style name. If your paragraph style names have commas in them, use everything up to but not including the comma as a style name.
Nesting_level
The nesting level. This should be zero except for nested list entries
PTagName
The name of the .PTag entry that should be used for paragraphs with this paragraph style.

Examples:
'Normal',0,'Normal' 'heading 1',0,'h1'
In the examples above, paragraphs with a style of 'Normal' will use the 'Normal' entry in the PTag table for markup tags. Paragraphs with a style of 'heading 1' will use the 'h1; entry in the PTag table for markup tags.

To understand nested lists, we really need an example.
Assume that your document contains:
  1. A numbered list item (paragraph style: 'numbered list')
  • A bullet list item (paragraph style 'my bullet list')
The bullet list item could be handled as the start of a new list, or as a sub-list within the numbered list. Paragraph styles that are to be treated as sublists are given a level number of 1 or higher which indicates how deeply nested they are to be within other lists. To treat the bullet list paragraph as a sub-list of the numbered list we would use the following PMatch entries.
'numbered list',0,'ol-d'
'my bullet list',1,'ul-d'

If we wanted the bullet list to appear at the same nesting level as the numbered list, we would change the PMatch entry for 'my bullet list' to:
'my bullet list',0,'ul-d'

In addition to specifying PTag entries to use for paragraph markup, there are a few special names that can be used instead.
  • Using '_Literal' in the PTagName causes those paragraphs to be inserted as literal HTML (or XML) text.
  • Using '_Discard' in the PTagName causes those paragraphs to be discarded.
  • Using '_Capture' in the PTagName causes the entire text of the paragraph to be saved into a string ($CaptureText) instead of being output. At the end of a captured paragraph, function _Capture is expanded to perform some processing on the captured text.
The PMatch mechanism can be overridden using EndPar processing. EndPar allows you to decide which paragraph markup to use based on the text of the paragraph, or where it occurs in the document. For more information, see End of Paragraph Processing

.PTag Table

Each entry in the .PTag table describes an HTML paragraph markup. The format is:
'name','starttag','endtag','col2mark','tabmark','parsep','linetag',allowtext,cannest,DeleteCol1,fold,TocStyl,generate_emptypar
PtagName
A unique name for this entry. These names are referenced in the .PMatch table.
starttag
This string will be output once at the beginning of any text for this markup.
endtag
This string will be output once at the end of any text for this markup.
col2mark
This string will be output in place of the first tab in every paragraph (used for lists)
tabmark
This string will be output for each tab found (after the first for lists).
parsep
This string will be output BETWEEN any two paragraphs - that have this style.
linetag
This string will be output for linefeeds. (manual line breaks)
allowtext
If 0, no text markup will be allowed within this markup. (for example <pre> or <h1> don't format well if they contain additional markup.
cannest
If 1, other paragraph markup will be allowed to nest within this markup. (used for nesting lists)
DeleteCol1
If 1, all text up to the first tab in a paragraph will be deleted. (used to strip out bullets that when going to unordered lists (<ul>).
fold
If 1, the filter will add newlines to the HTML to keep the number of characters in a line to less than 80. For <pre> or <listing> elements, this should be set to 0.
TocStyl
The TOC level. If greater than 0, this is a heading, and the filter will create a Table of contents entry for every paragraph using this markup
generate_emptypar
If 0, empty paragraphs will be ignored for this style. This should be set to 0 for lists so that you don't get spurious bullets or list item numbers.

For example assume that we have this PTag entry:
'h2','<h2>\n','</h2>\n','\t','\t','<br />','<br />',1,0,0,1,2,0

The markup generated will be
<h2>...</h2>, and tab characters will be preserved (browsers will treat as whitespace). The TocStyle is 2, so this will be included as a level two table of contents entry.

Example 2:
'ol-d','<ol>\n<li>','</li>/n</ol>','','\t','</li>\n<li>','<br />\n',1,1,1,1,0,0

Given these paragraphs as input:
1.<tab>First Item 2.<tab>Second Item<newline> Another line within the second item

The start tag '<ol>\n<li>' will be output prior to the first paragraph, and the end tag '</li></ol>' will be output after the last paragraph. In between the two paragraphs, the parsep tag will be used '</li>\n<li>'.
Because DeleteCol1 is set, all text up to the tab will be deleted. The newline will be replaced with linetag: '<br />\n'
The result will be:

<ol> <li>First Item</li> <li>Second Item<br /> Another line within the second item</li> </ol>


.Strings Table

The .Strings table sets values to be used by the translator. The format is
.Strings
name,'value'
Name
The string name. Names can use the characters a..z, 0-9 and underscore. Names are not cases sensitive.
value
The value of the string. String values can contain simple text, newlines (\n), tabs (\t) carriage returns (\r). Single quotes should b escaped by putting a backslash before them (\'). Strings can contain references to other strings ($StringName;) and functions ($FunctionName()).

The quotes may be omitted for numbers, so the following two lines are equivalent:
TheValue,'-1'
TheValue,-1

If a string is set more than once in a set of translation files, the first value encountered will be used for strings defined in cmd.trn, trnflag.trn and file.trn (where file is the name of the input file). This is done to allow values set on the command line, or by the GUI to override any other settings. If a string is set in base.trn or html.trn, the value can be changed throughout the conversion process, and the string always keeps the last value set. Variables can also be set by the 'Save' function statement.

Strings are used to specify translation options and markup that will be used in the output file.

.Colors Table

The .Colors table defines symbolic names for colors and assigns them to RGB values. The symbolic names are used in the .TMatch table when you are matching on color, and in the HTML output when you want to use a symbolic color name instead of RGB values. Since colors vary somewhat on different platforms, associating a color with an RGB value is done by finding the closest match with the RGB value for something in your RTF file. In the .TMatch table, a color is matched if the RGB values are close to the RGB value from this table. The tolerance for matching is set by the variable MaxColorDiff and is compared by summing the differences of the red, green and blue values.

The syntax of the .Colors table is
'colorname',r,g,b
where 'colorname' is the name of the color being defined and r,g and b are the integer values of the red, green and blue values respectively (in the range 0..255.)
Example:
.Colors 'Black',0,0,0 'Blue',0,0,212 'Cyan',2,171,234

.Charsets Table

The .Charsets table specifies the filenames for input character set translation. These files contain a mapping of each of the characters (0..255 for single-byte characters, 0.. 65534 for double-byte) for a particular font and character set to a unique identifier. A second file (specified by the string OutMap) associates that identifier with one or more characters that will be output. For an overview of character set translation, see Character Sets and Translation.

The Charsets table is searched for the best match of platform, number of bytes, character set, and font-name. An empty string for font means that that entry matches any font. –1 for platform or fontcharset always matches.

The syntax of the .Charsets table is:
platform,font,filename[,nbytes,fontcharset]
Where platform is:
0 - Ansi (used on Windows)
1 - Mac (used on Macintosh)
2 - pc (DOS)
3 - pca (not used....)
Nbytes is the number of bytes (1 or 2) of the characters being mapped. The default is 1.
Fontcharset is the character set that is associated with each font in the font table. The defined values for fontcharset are:
0
ANSI
1
Default
2
Symbol
77
Mac
128
Shift Jis
129
Hangul
130
Johab
134
GB2312
136
Big5
161
Greek
162
Turkish
163
Vietnamese
177
Hebrew
178
Arabic
179
Arabic Traditional
180
Arabic user
181
Hebrew user
186
Baltic
204
Russian
222
Thai
238
Eastern European
254
PC 437
255
OEM

Examples:
0,'Symbol','ansi-sym.txt' # matches symbol font on an ANSI platform (Windows) 1,'Symbol','mac-sym.txt' # matches symbol font on a Macintosh Platform -1,'','greek-gen.txt',1,161 # matches fonts marked as Greek regardless of platform -1,'','chinese.txt',2,134 # matches chinese double-byte fonts

.Functions Table

The functions table defines functions that can be used in string expansion. Anywhere you can use a string, you can also use a function. Function references will appear as $name(arg1,arg2,...) where name is the function name, and arg1...argn are optional string names (without $ or ';') that are passed into the function as arguments. Functions are very powerful because they can expand to different values depending on the context they are used in. There are many functions pre-defined for your use in html.trn and base.trn, but you can add your own as well. If you define a function in trnflag.trn (or file.trn where file is the filename of your RTF input) that has the same name as an existing function, your definition will be used instead.

The format of the function table is
.Functions Function name step1 step2 ... Function name2 ...

Step1, step2... are directives that define what steps the function will perform. The allowable directives are listed below.

Function

Syntax: Function Name
This starts a new function and assigns it a name. All functions must begin with this entry.
Within functions, the strings $0; through $9; are defined. $1; through $9; contain the parameters passed into the function. A special string $0; holds the result of the end of previous step, (for the first step it is initialized the the value of the first parameter). Strings &1; through &9; contain the names of the parameters passed into the function (this allows functions to update their parameters and is only supported in the Save function.) At the end of the function, it returns the current value of $0

In the following example, assume that we have the function:
Function LowCat
Set '$2;, $1;'
Lower

If we have a string $first; containing 'Bob' , and a string $last; containing 'Smyth' then expanding the function call $LowCat(first,last) would be processed as follows:
At the start of the function the string $1; would contain the value 'Bob' (the first argument to the function) and $2 would contain 'Smyth'. $0; would be initialized with 'Bob'
The Set directive would expand the string '$2;, $1;' into 'Smyth, Bob' and that would be the new value of $0;
Lower translates the current value of $0; to lower case, which yeilds 'smyth, bob' and this is the new value of $0;
Since this is the last step, the function $LowCat(first,last) expands to the string 'smyth, bob'.

The steps in a function are defined using the following directives:

Anchorfile

Syntax: Anchorfile BookMark
Takes the value as a bookmark or anchor in the current document and returns the output file name that contains that bookmark. If the value is not a valid bookmark, returns an empty string.

DBPrint

Syntax: DBPrint text
Prints the text in the error output. Used for debugging purposes only. DBPrint does not change $0
Example:
DBPrint 'The current input file is $Infile;'

Decode

Syntax: Decode
Translates the current expression ($0;) converting all occurrences of % and two hex digits to the corresponding ascii character.
Example:
Set 'Hello%20World'
Decode - evaluates to 'Hello World'

Encode

Syntax: Encode BadCharacters
Translates the current expression ($0;) converting all characters in the range of 1..31, or 126..255 or that appear in BadCharacters tohex notation of the form: % and two hex digits. This is useful in generating URL’s
Example:
Set 'Hello World'
Encode " " - evaluates to 'Hello%20World'

Gsub

Syntax: Gsub RegularExpression ReplacementExpression
Performs a regular expression style global substitution on the current expression ($0;) where RegularExpression is a regular expression search pattern and ReplacementExpression is a substitution pattern (i.e. the match and replacement is made on every occurrence of the RegularExpression in $0;
Example:
Set 'mississippi'
Gsub 'i.','_' - evaluates to 'm_s_s_pi'

Ifelse

Syntax: Ifelse String Pattern TrueString FalseString
If the String is matched by the Pattern, then return the TrueString otherwise return FalseString . An empty Pattern is treated as “^$” which matches only a null string.
Example:
Ifelse 'Mississippi','pp','Yes!','No' - evaluates to 'Yes!'

Lower

Syntax: Lower
Converts the current expression ($0;) to lower case

RMatch

Syntax: Rmatch Expression Pass1String Pass2String
RMatch is intended to be used in pass1 at the end of processing a paragraph. The expression is tested against the current paragraph and if a match is found, it sets the variables $_0 to the matched text, and $_1..$_9 to the 9subexpressions in the matched text. The string pass1 is then immediately expanded. In pass2, the matched text will not be processed using the .TMatch table. Instead, the pass2 string will be expanded with $MatchRepl being set to the entire text, as well as $_0 .. $_9 as described above.
Example:
RMatch,'http://[^ ]+','','<a href="$_0;">$_0;</a>'

In the above example, the pattern is http:// followed by a string of characters and terminated by a space (or the end of the paragraph.) If this is matched, then in pass1 nothing is done, and in pass2 a link is generated for that URL.

RPExp

Syntax: RPExp ‘tokenString’
Calculates the result of an algebraic expression written in Reverse Polish order. The tokens are either values or operators. A value is pushed onto a stack. An operator takes one or two values off of the stack operates on them and pushes a result.
Operators are:
+
Addition
RPExp 5 6 + # returns 11
-
Subtraction
RPExp 10 1 - # returns 9
*
Multiplication
RPExp 10 2 * # returns 20
/
Division
RPExp 5 2 / # returns 2.5
%
Modulas operator
RPExp 25 7% # returns 4
<
Less than
RPExp 10 1 < # returns 0 (false)
>
Greater than
RPExp 10 1 > # returns 1 (true)
=
Equality
RPExp 10 10 = # returns 1 (true)
I
Integer truncation
RPExp 9.5 I # returns 9

Example:
RPExp ‘10 1 + 12 25 - *’ # is the same as (10 + 1) * (12 – 25)

Save

Syntax: Save Name Value
saves the value to the variable Name
Syntax: Save &n; Value
saves the value to the nth parameter of the current function
Syntax: Save $varname; Value
saves the value to the variable who’s name is contained in $varname;
Note that Save does not change the value of $0

Script

Syntax: Script String
Executes the string as a script. The implementation of the Script command is system dependant.

Macintosh

The string is interpreted as AppleScript commands. The result of the AppleScript execution is returned as the Script result.

Unix

The string is written to a temporary file, and the command string defined in the variable ScriptExecCmd is executed and the standard output from that command returned as a result. The name of the temporary file is available in the variable ScriptFile.

Windows

Windows script processing is somewhat different. Rather than use the Script command directly, the function $WinSCmd(singleCmd) should be used which executes a single command. The output of the command is not available.

Sub

Syntax: Sub RegularExpression ReplacementExpression
Performs a regular expression style substitution on the current expression ($0;) where RegularExpression is a regular expression search pattern and ReplacementExpression is a substitution pattern
Example:
Set 'mississippi'
Sub 'i.','_' - evaluates to 'm_sissippi'

Switch...Case...Default...end

Syntax: Switch Value
Case Expression
...
Case Expression
...
Default
End
Takes the value and compares it against one or more regular expressions, each specified on a separate Case directive. The first matching case will be selected, and all the directives between the selected case directive and the next Case, Default or End will be executed. If no matching case is found, the Default directive is selected and all the directives between the default directive and the end will be executed. Switch...Case..Default End directives may be nested.
Example:
Switch '$text;'
Case 'bl..'
Set 'Is a color'
Case '12'
Set 'Is a number'
Default
Set 'I do not know'
End

If that $text contains 'blue' the first case would be selected and 'Is a color' would be set


Upper

Syntax: Upper
Converts the current expression ($0;) to upper case

VSet

Syntax: VSet Name
Sets the current expression ($0;) to the contents of the string Name.
Example:
Save Age '12'
Set 'Age' - evaluates to 'Age'
VSet 'Age' - evaluates to '12''

While

Syntax:

While Expression,end-condition
...
End

Executes the statements between While...End while the Expression expands to match the end-condition. On each iteration of the loop, $0; has the value of Expression.

For example (from GenList in html.trn):

Set '$ResetListPointer(theList)'
While '$GetListElement(theList,ListCurrent,listKeep)','.'
Save LoopVariable $0;
Set '$AppendTable(LoopVariable)'
End


Character Sets and Translation

Character sets are handled using UNICODE mapping. Each character set requires an input mapping file, these files are specified in the .Charsets table. Input mapping files identify the UNICODE number for each character in the input character set. Each font can have its own character set, which allows a single RTF file to contain multiple character sets. For output, each UNICODE character is assigned a string. These strings can be assigned in the file Uhtml-map.txt. By default, all UNICODE characters less than 0x7F (the Latin character set) are output directly, and higher numbers are output as &xnnn; where nnn is the decimal UNICODE number. You can override the default by adding an entry to Uhtml-map.txt.
Input map files are in the following format.
Format: Three tab-separated columns
Column #1 is the code (in hex)
Column #2 is the Unicode (in hex as 0xXXXX)
Column #3 is the Unicode name (follows a comment sign, '#')
For example, in the Cyrillic character set, the hex value 0xB8 represents the cyrillic small letter IO (UNICODE number 0x0451). This is specified as:
0xB8 0x0451 #CYRILLIC SMALL LETTER IO
This format matches the character translation files found at: http://www.unicode.org/Public/MAPPINGS/VENDORS/ Output map files (Uhtml-map.txt) are in the following format:
Format: Two tab-separated columns
Column #1 is the Unicode (in hex as 0xXXXX)
Column #2 is the string to output for that UNICODE character.
Single or double quotes may be used to quote strings containing whitespace or quotes (e.g., use single quotes to quote a double-quote).
Lines with a “#” in column one are taken as comments. Comments and blank lines are ignored.

For example, we want the UNICODE character 0x00AE (the registered mark) to be output as '&reg;'. This is specified as:
0x00AE &reg;

Pre-Defined Functions

Following is a table of the functions bult-in to the filter, pre-defined in base.trn, or are called by the filter during the conversion process.
Function names are not case-sensitive.
Function
Source
Description
$AddAnchor(anchor)
built-in
Adds ‘anchor’ to the list of hypertext anchors within the file being processed.
$AtPass2Start()
html.trn
Called at the start of the second pass.
$BaseName(FileName)
base.trn
Strips the directory path and extension from a filename
$Callback(CBWhy,errCode)
built-in
Provides a mechanism for ActiveX, DLL and library versions to give control back to the calling software during a translation. This mechanism is used for processing error messages and for setting up progress bars.
See Callbacks for more details.
$CallTrap()
built-in
Not currently used.
$CExpand(TestString,NotEmpty,IsEmpty)
base.trn
If the 1st argument is non-empty returns the 2nd argument, otherwise the 3rd is returned
$ClearList(listname)
built-in
Clears one or more of the user defined or internal lists. See Creating and Accessing Lists for details.
$CloseFile(filehandle)
built-in
Closes a previously opened (with CreateFile) file. See File Creation for details.
$Compare(string1,string2)
built-in
Does an lexigraphical comparison on two strings and returns 0 if equal, -1 if string1 is less than string 2 and 1 is string1 is greater than string 2.
$Context()
built-in
Expands to the current location in the RTF input. Used for error messages.
$CopyStr(Destination,Source)
base.trn
Copies the source string to the destination string.
$CreateFile(filename,mode)
built-in
Creates a file, returning a handle to the created file. See File Creation for details.
$CvtGraphic()
base.trn
Called to convert graphics from WMF, BMP or PICT format to translate them to JPG, GIF or PNG
$CvtPict(infile,outfile)
built-in
A Macintosh specific conversion feature.
$Delete(filename)
built-in
deletes the specified file
$EndResult()
html.trn
Called after an RTF field result is processed . (This function is called by the filter)
$Exit(rc)
built-in
Terminates the conversion software. This only applies to the command line version of the converter.
$GenFileSuffix(Heading)
base.trn
Used in the process of generating file names. Takes a heading and trims it to 8 characters – replacing characters that would be bad to use in a filename. (This function is called by the filter)
$GenFrames()
built-in
Creates frame files
$GenFileName()
base.trn
Used in the process of generating filenames when SplitDepth > 1. Called for each filename, with $Filename; set to the current filename and $FileNum set to the current file number. Return the new filename or ‘’ to use the existing filename.
$GenImageFileName()
base.trn
Used to generate image filenames. Called for each image, with $Filename; set to the current filename and $FileNum; set to the image file number. Return the new filename or ‘’ to use the existing filename.
$GenIndex(framed)
built-in
Generates the Index file. If framed is true, generates the framed version of the index.
$GenList(listname)
html.trn
Returns an unordered list containing the elements of the user defined or built-in list. See Creating and Accessing Lists for details.
$GenListElement(listname,where,pop)
built-in
Returns an element from a user defined or built-in list. See Creating and Accessing Lists for details.
built-in
Generates a table of contents for the current file (containing just the headings for the current file and it's children.)
built-in
Generates a table of contents for the peers of the current file.
$GenTOC(Framed)
built-in
Generates a table of contents for the entire RTF input file. If Framed is true, generates a framed version of the table of contents.
$GetCurrentFileHeading()
built-in
Returns the header of the current file, and sets the string $HeaderLink to the reference to the file.

$GetParentFileHeading()
built-in
R eturns the header of the parent file, and sets the string $HeaderLink to the reference to the file.
$GetElement(String,separator,n)
built-in
Scans String for the n'th element. The elements in the string are deliminated by a single character, contained in the string '$separator;'.
$GraphicFound()

This function is expanded for every image found in the RTF document.
$GraphicWritten()
base.trn
This function is expanded for every image that is saved.
$HeadingSplitCheck(HCLevel,HCText,HCParNum,HCContentPars
)
base.trn

$HrefFix(URL)
base.trn
Processes URL's and encodes invalid characters, trims excess whitespace. (This function is called by the filter)
$ImageRef(ImageName)

Translates image filename (local filename (1st argument) from system that generated the image) to it's proper location on your web server
$IndexEntry(entry,reference)
built-in
Adds an index entry for 'entry' and reference.
$IsZero(TestVar,IfZero,IfNonZero)

If the 1st argument is 0 returns the 2nd argument, otherwise the 3rd is returned
$Link(bookmark)
built-in
Expects text to be a link. If it matches a heading, will generate a link to that head. If it matches a bookmark, will generate a link to that bookmark. If it matches a URL or email address will emit that as a URL.
$NCompare(string1,string2)
built-in
Does an numeric comparison on two strings and returns 0 if equal, -1 if string1 is less than string 2 and 1 is string1 is greater than string 2
$NewDest(filename)
Built-in
Creates a virtual file to contain marked-up content. See $SetDest()
$PathToFile()
html.trn
Builds the complete list of ancestors of this file, using their headings and creating links to each parent file.
$PNOverrideFunction(PNText)
html.trn
Called at the end of an automatic list paragraph number. Note that this ONLY occurs automatically generated numbered and bullet lists. Returns the style that should be used for that list. This function allows The converter to recognize bullet and numbered lists even if the style names were not entered in the .PMatch table.
$PopTags(TagType,Mode)
built-in
Pops tags from the tag stack looking for a tag matching TagType. If no such tag is found, nothing happens.
Mode can be:
cPTInclusive – pops down to and including the first tag matching TagType. cPTExclusive – pops down to but does not include the tag matching TagType Will generate a warning if no matching tag is found.
cPTExclusiveNW – pops down to but does not include the tag matching TagType
cPTAll – Pops all tags matching TagType.
$ProcessField(FieldInst)
html.trn
Called to process an RTF field instruction. (This function is called by the filter)
$PushList(listname,listvalue,where)
built-in
Push the ‘listvalue’ string onto user defined list ‘listname’. See Creating and Accessing Lists for details.
$PushTag(TagType,ObjectID,StartTag,EndTag)
built-in
This function pushes the StartTag/endTag pair onto the current tag stack. StartTag will be output immediately, EndTag will be output when this tag is explicitly popped from the stack using $PopTags() or when a tag below this one in the stack gets popped. The ObjectID is currently unused.
$RelName(FileName)
base.trn
Converts a full filename into a relative one. (Strips off leading directory path.)
$ResetListPointer(ListName)
base.trn
Resets the user defined or built-in list back to the beginning. See Creating and Accessing Lists for details.
$ScriptExecCmd
base.trn
Contains the string used to execute a system command (to support the Script directive.)
$SetDest(filename)
built-in
Redirect all future output to the specified destination. The destination must have been created with $NewDest(). Returns the current destination.
$StripPN(heading)
base.trn
Strips leading paragraph numbers from a heading.
$TitlePageComplete
html-trn
Called by the converter when the title page has been generated.
$TranslationEnd

Expanded by the converter when translation is complete.
$EncodeDirPath(PATH)
base.trn
Encodes any special characters in a directory path name
$WriteFile(filehandle,text)
built-in
Wtites the string ‘text’ to the previously created file (with CreateFile). See File Creation for details.

End of Paragraph Processing

$EndPar; is expanded in pass1 at the end of each paragraph. By setting EndPar to expand a function call, you can make additional decisions about how to process the current paragraph. At the end of the paragraph, $__Partext; will contain the entire contents of the current paragraph as unformatted text. Using this mechanism, you can change both paragraph and text markup and processing based on pattern matching. You may also start a new html file at any paragraph using the EndPar mechanism. This processing is central to the RMatch directive described earlier. To understand how to use this feature, we will introduce several examples.

EndPar Example 1 – Selecting Paragraph Markup using Pattern Matching

In this example, we will find all paragraphs that begin with the word "Chapter" followed by a number and mark them as level one headings.
To do this, we will create a function that looks for 'Chapter n' where n is the chapter number , and overrides the normal PMatch processing.

The String '$MachName' contains the current matched paragraph style. If we change this to the name of another style name, that styles PMatch table entry will be used instead.
We will be using the $CopyStr() function to change the value of $MatchName.

EndPar,'$FindChapters()' # define $FindChapters() as our end of paragraph function
head1,'heading 1' # the paragraph style that we will use for chapters

.Functions
Function FindChapters
# The following line will change MatchName to 'heading 1' if the current paragraph
# begins with Chapter n
Ifelse '$__Partext;','^Chapter [0-9]+','$CopyStr(MatchName,head1)',''


$FindChapters() is called at the end of each paragraph, and $__Partext contains the text of the paragraph. If we get a match and override $MatchName, then our paragraph will get 'heading 1' processing.


EndPar Example 2 – Auto detection of URL's

In this example, we will examine each paragraph, looking for URL's. The URL's will be automatically converted to hyperlinks. This example uses the RMatch directive.
We begin by setting $FindURL() our end of paragraph routine.
.Strings EndPar,'$FindURLs()' # define $FindURL() as our end of paragraph function .Functions Function FindURLs

Next we use an RMatch directive to search the current paragraph for URL's.
RMatch,'http://[^> ]+','','<a href="$_0;">$_0;</a>'

This directive looks for all occurrences of an 'http://...' style URL in the current paragraph. For each occurrence of a URL, we will output a hypertext link. The value of $_0 each time will be the entire matched text.

Putting it all together (and adding patterns for mailto: and file: references), we have:

.Strings EndPar,'$FindURLs()' # define $FindURL() as our end of paragraph function .Functions Function FindURLs RMatch,'http://[^> ]+','','<a href="$_0;">$_0;</a>' RMatch,'mailto:[^> ]+','','<a href="$_0;">$_0;</a>' RMatch,'file://[^> ]+','','<a href="$_0;">$_0;</a>'

EndPar Example 3 – Splitting files at Page Mark’s

New output files may be started at any paragraph boundary. To indicate that a new file should be created, EndPar should save the string ‘PageHeadingLevel’ to be the virtual heading level of the current paragraph. If this is greater then or equal to the SplitDepth value, a new file will be started at that paragraph. Set string ParTOCEntry to be the string used in the TOC reference to this file and return from the EndPara function the filename suffix to be used for this new file. If neither of these strings are set, default values will be provided. The example below uses the fact that the string PageMark is expanded every time a hard page break is found in the document. The paragraph immediately after that starts a new file.
.Strings splitdepth,'1' EndPar,'$split()' PageMark,'$newPage()' Pass1Start,'$ClearList(ListAll)$SetIndexValues()' .Functions Function SetIndexValues Save fileInd,'0' Save throwPage,'0' Function NewFile save throwPage,'0' save ParaHeadingLevel,'1' set 'para$IncrString(fileInd)' save ParaTOCEntry, 'Paragraph $fileInd;' Function Split ifelse '$ParaHeadingLevel;$throwPage;$__ParText;','^01..*$','$NewFile()','' Function NewPage save throwPage,'1'

File Creation

User defined files may be created during the translation process using CreateFile, WriteFile and CloseFile built-in functions. They may appear in any translation file. The syntax is as follows:
  • CreateFile(filename,mode) will open a file with the specified file access mode (‘w’ to write a new file, ‘a’ to append to an existing file) and return a ‘filehandle’ used to write to and close the file.
  • WriteFile(filehandle,text) writes the (expanded) text string to file filehandle.
  • CloseFile(filehandle) closes the file filehandle.

For example, in trnflag.trn

.Strings TranslationEnd,'$genFile()' .Functions Function genFile Save filename,'$DirName(InputFile)/$Basename(InputFile)_test.txt' save mode,'w' save text,'Filename $InputFile; translated to $OutFileName;' save fileHandle,'$CreateFile(filename,mode)' ifelse '$fileHandle;','','',\ '$WriteFile(fileHandle,text)$CloseFile(fileHandle)'

Working with DTD's

Document Type Definitions (DTD's) define the structure of an XML (or SGML) document. They are used by browsers and other XML tools to validate your document, or even to just display it. (Even if you don't want to validate the document, a browser needs to use the DTD because it defines how entities (like &copy;) should be evaluated.) Your browser can use either a local copy of the DTD, or can fetch one from the internet. For performance reasons, Therefore, the declaration of the document type at the beginning of your document is important because it will tell your browser where to find the DTD. Browsers can <!DOCTYPE book PUBLIC $DocBookPublic; $DocBookSystem;>

1) Create a folder called DTD in the Logictran folder
2) Download DTD's from our web site and place them into
the DTD folder. The folder names must match the
naming convention that we have set up:

LogictranFolder/
DTD/
xhtml1/
xhtml1-frameset.dtd
xhtml1-transitional.dtd
Docbookx/
docbookx.dtd
oeb1/
oebpkg1.dtd

Set LocalDTD to 1 - when generating for local (users machine.)
The default is to use the http: addresses of the DTD's

Users can also customize this stuff for themselves:
DTDBase - Directory or folder containing all DTD's (no trailing /)
defaults to a folder called DTD within the
logictran folder.
HTMLSystem - Setting this overrides all other settings and lets
you directly set the "SYSTEM" path to your DTD.
Can be a URL or a file on your local system.

URLS used when LocalDTD is not set:
XHTML - No Frames
HTMLSystem,'"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"'
XHTML - Frames
HTMLSystem,'"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"'
OEB
OPFSystem,'"http://openebook.org/dtds/oeb-1.0/oebpkg1.dtd"'
DocBook
DocBookSystem,'"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"'

Creating and Accessing Lists

User defined lists may be created and accessed during the translation process. For example to keep track of elements of the RTF that it is required to write to an external file at the end of a translation. Internal lists are also exposed using the same mechanism. The syntax is as follows:
  • PushList(listname,listvalue,where) pushes the expanded string listvalue onto the list named listname. If the list doesn’t exist yet it will be created. If where is liststart, the new element will be added to the head of the list, otherwise it will be added to the end of the list. The default (if where is not specified) is to add to the end.
  • GetListElement(listname,where,pop) returns a list element from the specified list listname. where can have the value listStart – return from the front of the list; listEnd – return from the end of the list; or, listCurrent – return from the current position in the list and increment the current pointer to the next element. pop can have the values listKeep – do not remove the extracted element from the list; or, listRemove – remove the extracted element from the list. Using both where = listKeep and pop = listKeep allows lists to be accessed repeatedly.
  • ResetListPointer(listname) sets the named list listname to the beginning of the list. This should be used before accessing internal lists and also when using the where = listKeep and pop = listKeep combination for user defined lists.
  • ClearList(listname) clears one of more of the internal or user defined lists. That is, they are removed as if they had never existed. listname can have the value:
  1. <user defined list name> only that list is cleared.
  2. ListAll all user defined lists and the table and figure lists are cleared
  3. ListUser all user defined lists are cleared
ClearList(ListAll) is called at the start of a translation in html.trn with

'Pass1Start','$ClearList(ListAll)'

Modifying this allows for lists to be preserved between different input translation files. For example, to create a list of tables across multiple input files.


Various internally generated lists may be accessed through the same mechanism. You can not push anything onto an internal list. To access the internal lists, use the following listnames:
  • ListFile – list of all the generated files by the translation process.
  • ListSplitFile – if SplitDepth > 0, this is a list of the generated files as a result of the split.
  • ListAnchor – a list of all Anchors found (or added with $AddAnchor) in the translation process.
  • ListTable – a list of all tables.
  • ListFigure – a list of all figures.

Function $GenList(listname) is defined in html.trn that will return an unordered list containing all of the elements from list listname.

Callbacks

The ActiveX, DLL and shared library versions of the converter (requires a developers license) allow the R2Net converter to be called from an application. The API for calling the R2Net converter is documented in ActiveXReadMe.html , DllReadMe.html and libUnixReadMe.html respectively.
One of the most powerful features of this interface is the ability of the R2Net converter to call a function from the developers' application during the translation process. The developers' function can retrieve the current value of any string, (including text that is being translated), modify strings and call any translation functions. This allows a developer to write functions in Visual Basic, or C instead of using translation file code.

Let's say that you wanted to perform a database lookup whenever you encountered an ISBN number in the text of a document. In your documents, you have ISBN's appearing as (ISBN:09912992121) and you
want to add the title of the ISBN which is stored in your database. We need to do the following:
  1. Pattern match each paragraph in your document, looking for ISBN references
  2. On each match, call a routine in your application and pass it the ISBN number
  3. Return a string containing the ISBN number with the title appended to it

Step 1 is done using End of Paragraph Processing, so we will add the following code to our trnflag.trn file:
.Strings EndPar,'$ScanInputPars()' # Set ScanInputPars() to be called for each paragraph. cCBFoundISBN,100 # Define a unique integer to be passed to our callback # routine to identify what kind of callback this is. .Functions Function ScanInputPars RMatch,'\\(ISBN:[0-9]+\\)','','$CallBack(cCBFoundISBN,_0)'

This code sets up ScanInputPars() so that it is called at the end of each paragraph. Then on each paragraph we can do a pattern match (RMatch) looking for our ISBN numbers.
The first parameter to RMatch is the pattern, which will match on a (ISBN:xxxx) where xxx is an
ISBN number made up entirely of digits.
The second parameter for RMatch is unused, and the third parameter is a string to output whenever this pattern occurs.
We use $CallBack() to call a function in our application. The first parameter is an integer
value that tells our routine what kind of callback this is. A callback function must decide what processing is to be done based on the value of this first parameter. In Visual Basic, you would typically code a Select Case statement to handle this.
The second parameter to $Callback() (_0)is the entire matched pattern. so it will contain the parenthesis, the ISBN: and the actual ISBN number. We can pass as many arguments to our callback routine as we want. The function in our application will pick out just the ISBN number out of the string we passed
to it and return "(ISBN:xxxxxxxx) – the title of the book", which we will output directly.

On the application side, we write a single function to handle all callbacks from the converter. (See ActiveXReadMe.html , DllReadMe.html or libUnixReadMe.html for more details on coding the callback.) For our example, we will show a snippet of code from a Visual Basic callback function.
Our VB routine will take the second parameter (stored in args(0)) and pass it to a function that scans the database and returns a title. We then build a string that we want to appear in place of the ISBN reference.

Private Sub R2net_callback(ByVal why As Integer, ByVal nargs As Integer, args As Variant)
...
Select Case why

Case 1 'Error callback
...
Case 100 'Lookup an ISBN
StringToReturn = args(0) & " – " & LookupISBN(args(0))
R2net.SetString "CallBackReturn", StringToReturn
End If
End Select




Annotations

Annotations in MSWord have two components that are supported by the converter. The Annotation Reference contains the initials of the reviewer and the Annotation Text contains the text of the annotation. When an annotation is encountered, the string $Annotation is expanded at the location in the source where the annotation occurred. The default value for $Annotation is an empty string, so annotations are discarded. To enable annotations, define the string $annotation as follows:
.Strings Annotation,'$PopTags(cListTag,cPTExclusive)$Destination(catnid)\ $destination(cannotation)'

The $PopTags() call ensures that any character markup (<b>, <span>, <emphasis>) is closed prior to the annotation. The calls to $Destination() return the contents of the annotation ID and Annotation as marked up text.
The markup for Annotation Text is controlled by the paragraph style 'Annotation Text', which is mapped by default to the .Ptag entry 'Annotation Text'. Likewise the markup for the Annotation reference is controlled by the paragraph style 'Annotation Reference' which is mapped by default to the .Ptag entry 'Annotation Reference' .

Using the Tag Stack

The PushTag and PopTag functions provide you access to the tag stack. The tag stack contains the ending tags (for example, </body>) for each start tag emitted from the R2Net converter. Associated with each entry on the stack is a tagtype –which tells how this tag is used. For example, all tags for text markup (<b></b>) are would be a tagtype of cTextTag. The tagtype is used to find all of the tags that should be popped when some event (end of paragraph, end of file) occurs. An example tag stack is shown below. In this example, we would be outputting marked up text within a nested list inside a table:
TagType
Value
cHeadTag
</html>
CBodyTag
</body>
cTableTag
</table>
CRowTag
</tr>
CCellTag
</td>
cListTag
</ol>
cListTag
</bl>
cTextTag
</b>
CTextTag
</i>

There are two functions that allow you to access the tag stack. They are PushTag and PopTags:
$PushTag(TagType,ObjectID,StartTag,EndTag)
This function pushes the StartTag/endTag pair onto the current tag stack. StartTag will be output immediately, EndTag will be output when this tag is explicitly popped from the stack using $PopTags() or when a tag below this one in the stack gets popped.
ObjectID is currently unused.

$PopTags(TagType,Mode)
Pops tags from the tag stack looking for a tag matching TagType. If no such tag is found, nothing happens.
Mode can be:
cPTInclusive – pops down to and including the first tag matching TagType.
cPTExclusive – pops down to but does not include the tag matching TagType.
Will generate a warning if no matching tag is found.
cPTExclusiveNW – pops down to but does not include the tag matching TagType
Will not generate a warning if no matching tag is found.
cPTAll – Pops all tags matching TagType.


The key to working with the tag stack is understanding when tags are pushed and popped. The following table lists the major events for the tag stack.
Event
Actions
Start of an output file
$PushTag(cHeadTag,empty,TitlePageHead, TitlePageEnd)
$PushTag(cBodyTag,empty,TitlePageBody, TitlePageBodyEnd)
Start of table
$PushTag(cTableTag,_table_start_, _table_end_)
Start of row
$PushTag(cRowTag,_row_start_, _row_end_)
Start of cell
$PushTag(cCellTag,_data_cell_start_, _data_cell_end_)
Start of a Paragraph List
$PushTag(cListTag, empty,StartTag,EndTag) where StartTag and EndTag are taken from the .Ptag table entries for this paragraph
Start of a text run
$PushTag(cTextTag, empty,StartTag,EndTag) where StartTag and EndTag are taken from the .Ttag table entries for this selection of text.
End of a File
$PopTags(cBodyTag,cPTInclusive)
$PopTags(cHeadTag,cPTInclusive)
End of a row
$PopTags(cRowTag,cPTInclusive)
End of a cell
$PopTags(cCellTag,cPTInclusive)
End of a table
$PopTags(cTableTag,cPTInclusive)
End of a paragraph
$PopTags(cListTag,cPTExclusive)
If the next paragraph has the same paragraph style, output the separator tag from the .Ptag table.
Otherwise:
$PopTags(cListTag,cPTInclusive)
End of a text run
$PopTags(cTextTag,cPTInclusive)

There are 15 pre-defined tag types used by the converter, but you can define your own tag types (using
Numbers greater than 15.) Define your tag types by creating a new string for each tag type.
The 15 pre-defined tag types are:
cHeadTag,1
cBodyTag,2
cTableTag,3
cRowGroupTag,4
cColGroupTag,5
cRowTag,6
cCellTag,7
cHeadListTag,8
cIPanelTag,9
cSITag,10
cSIRTag,11
cListTag,12
cTextTag,13
cFootnoteTag,14
cSpecialTag,15


Tuning the Table of Contents Generation

GenTOC

$GenTOC(framed) is a built-in function, called automatically when a Table of Contents is required. It can not be re-defined.

Setting the string
'genContents','1'

will result in $GenTOC() being called. If frames are being generated in the output html, then the argument to GenTOC will be TRUE, and a frame based TOC will be created. This action is defined in html.trn and can be found by searching for GenTOC.

By default, the TOC has the following form:

This is a Level1 heading
This is a Level2 heading
This is a Level3 heading
This is a Level4 heading
This is a Level5 heading
This is a Level6 heading

$GenTOC() may also be called directly, allowing a large degree of control over the generated TOC. For example, to invoke GenTOC at the start of Pass2 processing (that is, when the translation process has determined the list of output files to be created, heading levels, etc.), add the following into, for example, trnflag.trn.

.Strings
'Pass2Start','$GenTOC(FALSE)'

This will create a non-framed based TOC, by default with the same format as the defining 'genContents' example above.

GenTOC will expand the following strings as part of the TOC creation. Changing the string values will alter the output file generated for the TOC.

1. ContentsPageHead at the beginning of the TOC file.
2. ContentsPageBody before the TOC starts.
3. _head_list_start_ starts the heading list and each sub-heading list. For example, in the above TOC, _head_list_start_ will be expanded before the <H1>... line, then again at the start of the <H2>... line, etc. By default its value is '<ul>\n', but could be changed to an ordered list, starting a table, etc.
4. _head_list_entry_ is expanded for every entry in the TOC list. At this point, the following read-only strings will have the correct values for the current entry, and may be used in the _head_list_entry_ expansion:

  • _LINK_ is the hyperlink
  • _HOTTEXT_ is the text to be placed in the link

where _head_list_entry_ has the default value

'_head_list_entry_','<li><a href="$_LINK_;">$StripPN(_HOTTEXT_)</a></li>\n'

$StripPN() removes any paragraph numbers attached to the heading text.

5. _head_list_end_ ends the heading list and each sub-heading list. By default its value is '</ul>\n'
6. ContentsPageEnd at the end of the TOC file.

The following string is set automatically during table of contents processing:
1. _StartHeadingLevel_ is set to the starting heading level in the table of contents. This could be used to, for example, affect the indent level of the first entry in the contents list.



The default values for each of these strings can be found in html.trn . Other strings are expanded as part of these strings. For example, $Title; for the document title.

Overriding any of these strings should be done carefully and after reviewing the default values such that well-formed html/xml is created. For example, _head_list_end_ should close any tags opened in _head_list_start (eg <ul> and </ul> pairs), otherwise mis-matched tags will result in the generated html.


GenPeerTOC

$GenPeerTOC() is a built-in function to create a Peer Table of Contents. It can not be re-defined. A Peer TOC is one which links a file to all of the files are the same heading level as the current file. For example, if the RTF has the following form

  • Heading 1
    • Heading 1.1
    • Heading 1.2
  • Heading 2
    • Heading 2.1
    • Heading 2.2
  • Heading 3
    • ...

then using $GenPeerTOC(), and a SplitDepth of 1, will create a TOC listing for Headings 1, 2 and 3. This is useful for creating links between separate chapters in a document, or sections in a chapter, etc.

GenPeerTOC() is not called automatically. It should be called at the point in the translation where the Peer TOC is required. For example, as part of the Navigation Panel expansion:

'GenNavPanel','$GenPeerTOC();\n\'

From this, an unordered list of the Peer TOC would be generated and placed in the output file being created.

Default actions for creating navigation panels can be found in html.trn. See ??? for details on the purpose of html.trn, and associated translation (.trn) files, and recommendations on where customizations should be made.

GenPeerTOC will expand the following strings as part of the TOC creation. Changing the string values will alter the output genererated for the Peer TOC.


1. _head_list_start_ starts the heading list.

2.1 _head_list_entry_ is expanded for every non-current file entry in the TOC list.
2.2 _head_list_entry_current_ is expanded for the current file entry. For example, when processing the "Heading 2" output file in the above example, _head_list_entry_current_ will be expanded for "Heading 2", and _head_list_entry_ will be expanded for Headings 1 and 3. This allows different output to be generated for the current file (eg to not add a hyperlink to itself).

When _head_list_entry_ or _head_list_entry_current are expanded, the following read-only strings will have the correct values for the current entry, and may be used in the _head_list_entry_/_head_list_entry_current_ expansion:

  • _LINK_ is the hyperlink
  • _HOTTEXT_ is the text to be placed in the link

where _head_list_entry_ and _head_list_entry_current have the default values

'_head_list_entry_','<li><a href="$_LINK_;">$StripPN(_HOTTEXT_)</a></li>\n'

'_head_list_entry_current_','<li>$StripPN(_HOTTEXT_)</li>\n'

$StripPN() removes any paragraph numbers attached to the heading text.

3. _head_list_end_ ends the heading list. By default its value is '</ul>\n'

The default values for each of these strings can be found in html.trn . Other strings are expanded as part of these strings. For example, $Title; for the document title.

Overriding any of these strings should be done carefully and after reviewing the default values such that well-formed html/xml is created. For example, _head_list_end_ should close any tags opened in _head_list_start (eg <ul> and </ul> pairs), otherwise mis-matched tags will result in the generated html.

Appendix A AppleEvents

r2net supports the required suite of Apple Events (Open, Close (ignored), Print (ignored) and Quit). Note that the Open AppleEvent converts the file being opened. The Open event places the document name onto a queue and returns immediately, so you must test to see if r2net is still running to determine when your covnersion is finished. Alternatively, you could have r2net send an AppleEvent back to the caller upon document completion.

There is an experimental event («event R2HcStOp») that allows you to send translation files lines. The lines are treated as if they appeared at the top of the translation files file (to support setting options and strings.) The following sample code will illustrate:

This is a sample of calling r2net using an experimental AppleEvent

tell application "r2net"
«event R2HcStOp» ¬
".Strings
'Pass1Start','$OverrideSettings()'
'Simple','1'
.Functions
Function OverrideSettings
DBPrint 'SplitDepth is $SplitDepth;'
Save 'SplitDepth',3
Save 'RefsOnTop',1
Save 'SkipNavPanel','0'
Save 'SkipLeadingToc','0'
Save 'SkipTrailingToc','1'
Save 'GenContents','1'
Save 'GenFrames','1'
Save 'GenIndex','1'
"
open alias "ChrisHD:RTF:TestCase:Version4.RTF"
activate
end tell
Note that because these settings are processed prior to the rest of the translation files, using the .Strings mechanism to set variables may result in the variables being re-set by subsequent translation files. To avoid this, we generate a function that is called after html.trn is processed and we force the variables to have certain values.

Notes on Apple Event R2HcStOp

The R2HcStOp event takes one argument (a string containing translation file commands.) This string is treated as if it were pre-pended to the start of the html.trn file. The result of the R2HcStOp event is the previous string (if any).

If r2net is started with an Open event, it will quit after the document(s) have been processed. If activate r2net first, and then send documents to process, it will remain running after all documents have been processed.



Appendix B Regular Expressions

Regular expressions are used in translation files for the Ifelse, Sub and Gsub directives. The variation of regular expressions used here is documented below and was developed by Henry Spencer at the University of Toronto.

REGULAR EXPRESSION SYNTAX

A regular expression is zero or more branches, separated by `|'. It matches anything that matches one of the branches.

A branch is zero or more pieces, concatenated. It matches a match for the first, followed by a match for the second, etc.

A piece is an atom possibly followed by `*', `+', or `?'.
An atom followed by `*' matches a sequence of 0 or more matches of the atom.
An atom followed by `+' matches a sequence of 1 or more matches of the atom.
An atom followed by `?' matches a match of the atom, or the null string.

An atom is a regular expression in parentheses (matching a match for the regular expression), a range (see below), `.' (matching any single character), `^' (matching the null string at the beginning of the input string), `$' (matching the null string at the end of the input string), a `\' followed by a single character (matching that character), or a single character with no other significance (matching that character).

A range is a sequence of characters enclosed in `[]'. It normally matches any single character from the sequence. If the sequence begins with `^', it matches any single character not from the rest of the sequence. If two characters in the sequence are separated by `-', this is shorthand for the full list of ASCII characters between them (e.g. `[0-9]' matches any decimal digit). To include a literal `]' in the sequence, make it the first character (following a possible `^'). To include a literal `-', make it the first or last character.

AMBIGUITY

If a regular expression could match two different parts of the input string, it will match the one which begins earliest. If both begin in the same place but match different lengths, or match the same length in different ways, life gets messier, as follows.

In general, the possibilities in a list of branches are considered in left-to-right order, the possibilities for `*', `+', and `?' are considered longest-first, nested constructs are considered from the outermost in, and concatenated constructs are considered leftmost-first. The match that will be chosen is the one that uses the earliest possibility in the first choice that has to be made. If there is more than one choice, the next will be made in the same manner (earliest possibility) subject to the decision on the first choice. And so forth.

For example, `(ab|a)b*c' could match `abc' in one of two ways. The first choice is between `ab' and `a'; since `ab' is earlier, and does lead to a successful overall match, it is chosen. Since the `b' is already spoken for, the `b*' must match its last possibility-the empty string-since it must respect the earlier choice.

In the particular case where no `|'s are present and there is only one `*', `+', or `?', the net effect is that the longest possible match will be chosen. So `ab*', presented with `xabbbby', will match `abbbb'. Note that if `ab*' is tried against `xabyabbbz', it will match `ab' just after `x', due to the begins-earliest rule. (In effect, the decision on where to start the match is the first choice to be made, hence subsequent choices must respect it even if this leads them to less-preferred alternatives.)

Replacement Patterns

Each instance of `&' in the replacement pattern is replaced by the entire matched expression. Each instance of `\\i',
where i is a digit, is replaced by the i'th subpattern. To get a literal `&' or `\i' into dest, prefix it with `\\'.

Appendix C Language Identifier Values

The following table gives the decimal values for the $Language variable associated with various languages.
Language
Value
NoLang
1024
Albanian
1052
Arabic
1025
Bahasa
1057
BelgianDutch
2067
BelgianFrench
2060
BrazilianPortuguese
1046
Bulgarian
1026
Catalan
1027
LatinCroatoSerbian
1050
Czech
1029
Danish
1030
Dutch
1043
AustralianEnglish
3081
UKEnglish
2057
USEnglish
1033
Finnish
1035
French
1036
CanadianFrench
3084
German
1031
Greek
1032
Hebrew
1037
Hungarian
1038
Icelandic
1039
Italian
1040
Japanese
1041
Korean
1042
BokmalNorwegian
1044
NynorskNorwegian
2068
Polish
1045
Portuguese
2070
RhaetoRomanic
1047
Romanian
1048
Russian
1049
CyrillicSerboCroatian
2074
SimplifiedChinese
2052
Slovak
1051
CastilianSpanish
1034
MexicanSpanish
2058
Swedish
1053
SwissFrench
4108
SwissGerman
2055
SwissItalian
2064
Thai
1054
TraditionalChinese
1028
Turkish
1055
Urdu
1056


What's New in Version 5?

If you are upgrading from version 4,there have been some changes made to the translation files. This section will help you carry any changes you have made to version 4 forward to version 5.

New File Names

All of the files distributed with the filter now have extensions. This makes it easier to work with them on Windows Platforms. The new file names are listed below.

Old Name
New Name
Notes
all-sym

Not used in 5.0
ansi-gen
ansi-gen.txt
Unchanged
ansi-sym
ansi-sym.txt
Unchanged

base.trn
New file in version 5. It is included from html.trn and contains string and function settings that are used for HTML and XML translations.
html-map
html-map.txt
Unchanged
html-trn
html.trn
Many definitions moved to base.trn. All the markup has been changed to conform to XHTML specifications. Tags are lower case, all standalone tags like '<br>' are now modified to generate '<br />' as well. Elements like '<li>' now have closing tags.
html.trn is only processed in pass1. At the start of pass2, the function $AtPass2Start()function is called instead of re-reading html.trn.
mac-gen
mac-gen.txt
Unchanged
mac-sym
mac-sym.txt
Unchanged
pc-gen
pc-gen.txt
Unchanged
pc-sym
pc-sym.txt
Unchanged
pca-gen
pca-gen.txt
Unchanged
pca-sym
pca-sym.txt
Unchanged
rtf-ctrl
rtf-ctrl.txt
All Word2000 tokens have been added.
ct4.ct
ct4.ct
Unchanged
ct8.ct
ct8.ct
Unchanged
trnflag.trn
trnflag.trn
Commented out sample lines. Changed default image extension to 'jpg'
html-map.v3
html-mapv3.txt
Unchanged
html-map.v4
html-mapv4.txt
Unchanged
html-map.v4p
html-mapv4p.txt
Unchanged
html-map.v4s
html-mapv4s.txt
Unchanged
License
ltkeys.txt
You need a 5.0 license. Demo licenses are available here.

Strategy for Moving up to Version 5

  1. Your trnflag.trn files can be used without change in version 5.
  2. Identify the differences between your html-trn files and the base 4.0 release. This can be done by using a file comparison utility. On Unix platforms, use the diff command. On Windows platforms you can get a free file comparison utility from ComponentSoftware at: http://www.componentsoftware.com/csdiff/ .
  3. Put any changes to the html-trn file into your trnflag.trn file. If you have modified PTag or TTag entries, you may want to make sure that your versions are XHTML compliant.
  4. If you have change that you would like to share with other users of the Logictran RTF converter, contact us! We are putting together a user-interchange area on our web site. All submissions should contain a description of your changes and how to use them. All submissions should be in the public domain.


Last Update:02/25/2003


© Copyright 2003 Logictran, Inc. All rights reserved. - Privacy Statement