Class HyphenationTree
- Namespace
- iText.Layout.Hyphenation
- Assembly
- itext.layout.dll
This tree structure stores the hyphenation patterns in an efficient way for fast lookup.
public class HyphenationTree : TernaryTree, IPatternConsumer
- Inheritance
-
HyphenationTree
- Implements
- Inherited Members
Remarks
This tree structure stores the hyphenation patterns in an efficient way for fast lookup. It provides the provides the method to hyphenate a word.
This work was authored by Carlos Villegas (cav@uniscope.co.jp).Constructors
HyphenationTree()
Default constructor.
public HyphenationTree()
Fields
classmap
This map stores the character classes
protected TernaryTree classmap
Field Value
stoplist
This map stores hyphenation exceptions
protected IDictionary<string, IList> stoplist
Field Value
vspace
value space: stores the interletter values
protected ByteVector vspace
Field Value
Methods
AddClass(string)
Add a character class to the tree.
public virtual void AddClass(string chargroup)
Parameters
chargroup
stringa character class (group)
Remarks
Add a character class to the tree. It is used by PatternParser as callback to add character classes. Character classes define the valid word characters for hyphenation. If a word contains a character not defined in any of the classes, it is not hyphenated. It also defines a way to normalize the characters in order to compare them with the stored patterns. Usually pattern files use only lower case characters, in this case a class for letter 'a', for example, should be defined as "aA", the first character being the normalization char.
AddException(string, IList)
Add an exception to the tree.
public virtual void AddException(string word, IList hyphenatedword)
Parameters
word
stringnormalized word
hyphenatedword
ILista vector of alternating strings and hyphen objects.
Remarks
Add an exception to the tree. It is used by PatternParser class as callback to store the hyphenation exceptions.
AddPattern(string, string)
Add a pattern to the tree.
public virtual void AddPattern(string pattern, string ivalue)
Parameters
pattern
stringthe hyphenation pattern
ivalue
stringinterletter weight values indicating the desirability and priority of hyphenating at a given point within the pattern. It should contain only digit characters. (i.e. '0' to '9').
Remarks
Add a pattern to the tree. Mainly, to be used by PatternParser class as callback to add a pattern to the tree.
FindPattern(string)
Find pattern.
public virtual string FindPattern(string pat)
Parameters
pat
stringa pattern
Returns
- string
a string
GetValues(int)
Get values.
protected virtual byte[] GetValues(int k)
Parameters
k
intan integer
Returns
- byte[]
a byte array
Hstrcmp(char[], int, char[], int)
String compare, returns 0 if equal or t is a substring of s.
protected virtual int Hstrcmp(char[] s, int si, char[] t, int ti)
Parameters
s
char[]first character array
si
intstarting index into first array
t
char[]second character array
ti
intstarting index into second array
Returns
- int
an integer
Hyphenate(char[], int, int, int, int)
Hyphenate word and return an array of hyphenation points.
public virtual Hyphenation Hyphenate(char[] w, int offset, int len, int remainCharCount, int pushCharCount)
Parameters
w
char[]char array that contains the word
offset
intOffset to first character in word
len
intLength of word
remainCharCount
intMinimum number of characters allowed before the hyphenation point.
pushCharCount
intMinimum number of characters allowed after the hyphenation point.
Returns
- Hyphenation
a Hyphenation object representing the hyphenated word or null if word is not hyphenated.
Hyphenate(string, int, int)
Hyphenate word and return a Hyphenation object.
public virtual Hyphenation Hyphenate(string word, int remainCharCount, int pushCharCount)
Parameters
word
stringthe word to be hyphenated
remainCharCount
intMinimum number of characters allowed before the hyphenation point.
pushCharCount
intMinimum number of characters allowed after the hyphenation point.
Returns
- Hyphenation
a Hyphenation object representing the hyphenated word or null if word is not hyphenated.
LoadPatterns(Stream, string)
Read hyphenation patterns from an XML file.
public virtual void LoadPatterns(Stream stream, string name)
Parameters
stream
Streamthe InputSource for the file
name
stringunique key representing country-language combination
LoadPatterns(string)
Read hyphenation patterns from an XML file.
public virtual void LoadPatterns(string filename)
Parameters
filename
stringthe filename
PackValues(string)
Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9.
protected virtual int PackValues(string values)
Parameters
values
stringa string of digits from '0' to '9' representing the interletter values.
Returns
- int
the index into the vspace array where the packed values are stored.
Remarks
Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9. We use zero as terminator, so we'll add 1 to the value.
SearchPatterns(char[], int, byte[])
Search for all possible partial matches of word starting at index an update interletter values.
protected virtual void SearchPatterns(char[] word, int index, byte[] il)
Parameters
word
char[]null terminated word to match
index
intstart index from word
il
byte[]interletter values array to update
Remarks
Search for all possible partial matches of word starting at index an update interletter values. In other words, it does something like:
for(i=0; i<patterns.length; i++) {
if ( word.substring(index).startsWidth(patterns[i]) )
update_interletter_values(patterns[i]);
}
But it is done in an efficient way since the patterns are
stored in a ternary tree. In fact, this is the whole purpose
of having the tree: doing this search without having to test
every single pattern. The number of patterns for languages
such as English range from 4000 to 10000. Thus, doing thousands
of string comparisons for each word to hyphenate would be
really slow without the tree. The tradeoff is memory, but
using a ternary tree instead of a trie, almost halves the
the memory used by Lout or TeX. It's also faster than using
a hash table
UnpackValues(int)
Unpack values.
protected virtual string UnpackValues(int k)
Parameters
k
intan integer
Returns
- string
a string