Class HyphenationTree
- Namespace
- iTextSharp.text.pdf.hyphenation
- Assembly
- iTextSharp.LGPLv2.Core.dll
This tree structure stores the hyphenation patterns in an efficient way for fast lookup. It provides the provides the method to hyphenate a word. @author Carlos Villegas
public class HyphenationTree : TernaryTree, ICloneable, IPatternConsumer
- Inheritance
-
HyphenationTree
- Implements
- Inherited Members
Constructors
HyphenationTree()
public HyphenationTree()
Fields
Classmap
This map stores the character classes
protected TernaryTree Classmap
Field Value
Stoplist
This map stores hyphenation exceptions
protected INullValueDictionary<string, List<object>> Stoplist
Field Value
Vspace
value space: stores the inteletter values
protected ByteVector Vspace
Field Value
Methods
AddClass(string)
Add a character class to the tree. It is used by {@link SimplePatternParser SimplePatternParser} as callback to add character classes. Character classes define the valid word characters for hyphenation. If a word contains a character not defined in any of the classes, it is not hyphenated. It also defines a way to normalize the characters in order to compare them with the stored patterns. Usually pattern files use only lower case characters, in this case a class for letter 'a', for example, should be defined as "aA", the first character being the normalization char.
public void AddClass(string chargroup)
Parameters
chargroup
string
AddException(string, List<object>)
Add an exception to the tree. It is used by {@link SimplePatternParser SimplePatternParser} class as callback to store the hyphenation exceptions. {@link Hyphen hyphen} objects.
public void AddException(string word, List<object> hyphenatedword)
Parameters
AddPattern(string, string)
Add a pattern to the tree. Mainly, to be used by {@link SimplePatternParser SimplePatternParser} class as callback to add a pattern to the tree. desirability and priority of hyphenating at a given point within the pattern. It should contain only digit characters. (i.e. '0' to '9').
public void AddPattern(string pattern, string values)
Parameters
FindPattern(string)
public string FindPattern(string pat)
Parameters
pat
string
Returns
GetValues(int)
protected byte[] GetValues(int k)
Parameters
k
int
Returns
- byte[]
Hstrcmp(char[], int, char[], int)
String compare, returns 0 if equal or t is a substring of s
protected static int Hstrcmp(char[] s, int si, char[] t, int ti)
Parameters
Returns
Hyphenate(char[], int, int, int, int)
w = "nnllllllnnn*", where n is a non-letter, l is a letter, all n may be absent, the first n is at offset, the first l is at offset + iIgnoreAtBeginning; word = ".llllll.'\0'***", where all l in w are copied into word. In the first part of the routine len = w.length, in the second part of the routine len = word.length. Three indices are used: Index(w), the index in w, Index(word), the index in word, Letterindex(word), the index in the letter part of word. The following relations exist: Index(w) = offset + i - 1 Index(word) = i - iIgnoreAtBeginning Letterindex(word) = Index(word) - 1 (see first loop). It follows that: Index(w) - Index(word) = offset - 1 + iIgnoreAtBeginning Index(w) = Letterindex(word) + offset + iIgnoreAtBeginning
public Hyphenation Hyphenate(char[] w, int offset, int len, int remainCharCount, int pushCharCount)
Parameters
w
char[]char array that contains the word
offset
intOffset to first character in word
len
intLength of word
remainCharCount
intMinimum number of characters allowed
pushCharCount
intMinimum number of characters allowed after
Returns
- Hyphenation
a {@link Hyphenation Hyphenation} object representing
Hyphenate(string, int, int)
Hyphenate word and return a Hyphenation object. before the hyphenation point. the hyphenation point. the hyphenated word or null if word is not hyphenated.
public Hyphenation Hyphenate(string word, int remainCharCount, int pushCharCount)
Parameters
word
stringthe word to be hyphenated
remainCharCount
intMinimum number of characters allowed
pushCharCount
intMinimum number of characters allowed after
Returns
- Hyphenation
a {@link Hyphenation Hyphenation} object representing
LoadSimplePatterns(Stream)
public void LoadSimplePatterns(Stream stream)
Parameters
stream
Stream
PackValues(string)
Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9. We use zero as terminator, so we'll add 1 to the value. interletter values. are stored.
protected int PackValues(string values)
Parameters
values
stringa string of digits from '0' to '9' representing the
Returns
- int
the index into the vspace array where the packed values
PrintStats()
public override void PrintStats()
SearchPatterns(char[], int, byte[])
Search for all possible partial matches of word starting at index an update interletter values. But it is done in an efficient way since the patterns are stored in a ternary tree. In fact, this is the whole purpose of having the tree: doing this search without having to test every single pattern. The number of patterns for languages such as English range from 4000 to 10000. Thus, doing thousands of string comparisons for each word to hyphenate would be really slow without the tree. The tradeoff is memory, but using a ternary tree instead of a trie, almost halves the the memory used by Lout or TeX. It's also faster than using a hash table
protected void SearchPatterns(char[] word, int index, byte[] il)
Parameters
word
char[]null terminated word to match
index
intstart index from word
il
byte[]interletter values array to update
UnpackValues(int)
protected string UnpackValues(int k)
Parameters
k
int