Table of Contents

Class HyphenationTree

Namespace
iText.Layout.Hyphenation
Assembly
itext.layout.dll

This tree structure stores the hyphenation patterns in an efficient way for fast lookup.

public class HyphenationTree : TernaryTree, IPatternConsumer
Inheritance
HyphenationTree
Implements
Inherited Members

Remarks

This tree structure stores the hyphenation patterns in an efficient way for fast lookup. It provides the provides the method to hyphenate a word.

This work was authored by Carlos Villegas (cav@uniscope.co.jp).

Constructors

HyphenationTree()

Default constructor.

public HyphenationTree()

Fields

classmap

This map stores the character classes

protected TernaryTree classmap

Field Value

TernaryTree

stoplist

This map stores hyphenation exceptions

protected IDictionary<string, IList> stoplist

Field Value

IDictionary<string, IList>

vspace

value space: stores the interletter values

protected ByteVector vspace

Field Value

ByteVector

Methods

AddClass(string)

Add a character class to the tree.

public virtual void AddClass(string chargroup)

Parameters

chargroup string

a character class (group)

Remarks

Add a character class to the tree. It is used by PatternParser as callback to add character classes. Character classes define the valid word characters for hyphenation. If a word contains a character not defined in any of the classes, it is not hyphenated. It also defines a way to normalize the characters in order to compare them with the stored patterns. Usually pattern files use only lower case characters, in this case a class for letter 'a', for example, should be defined as "aA", the first character being the normalization char.

AddException(string, IList)

Add an exception to the tree.

public virtual void AddException(string word, IList hyphenatedword)

Parameters

word string

normalized word

hyphenatedword IList

a vector of alternating strings and hyphen objects.

Remarks

Add an exception to the tree. It is used by PatternParser class as callback to store the hyphenation exceptions.

AddPattern(string, string)

Add a pattern to the tree.

public virtual void AddPattern(string pattern, string ivalue)

Parameters

pattern string

the hyphenation pattern

ivalue string

interletter weight values indicating the desirability and priority of hyphenating at a given point within the pattern. It should contain only digit characters. (i.e. '0' to '9').

Remarks

Add a pattern to the tree. Mainly, to be used by PatternParser class as callback to add a pattern to the tree.

FindPattern(string)

Find pattern.

public virtual string FindPattern(string pat)

Parameters

pat string

a pattern

Returns

string

a string

GetValues(int)

Get values.

protected virtual byte[] GetValues(int k)

Parameters

k int

an integer

Returns

byte[]

a byte array

Hstrcmp(char[], int, char[], int)

String compare, returns 0 if equal or t is a substring of s.

protected virtual int Hstrcmp(char[] s, int si, char[] t, int ti)

Parameters

s char[]

first character array

si int

starting index into first array

t char[]

second character array

ti int

starting index into second array

Returns

int

an integer

Hyphenate(char[], int, int, int, int)

Hyphenate word and return an array of hyphenation points.

public virtual Hyphenation Hyphenate(char[] w, int offset, int len, int remainCharCount, int pushCharCount)

Parameters

w char[]

char array that contains the word

offset int

Offset to first character in word

len int

Length of word

remainCharCount int

Minimum number of characters allowed before the hyphenation point.

pushCharCount int

Minimum number of characters allowed after the hyphenation point.

Returns

Hyphenation

a Hyphenation object representing the hyphenated word or null if word is not hyphenated.

Hyphenate(string, int, int)

Hyphenate word and return a Hyphenation object.

public virtual Hyphenation Hyphenate(string word, int remainCharCount, int pushCharCount)

Parameters

word string

the word to be hyphenated

remainCharCount int

Minimum number of characters allowed before the hyphenation point.

pushCharCount int

Minimum number of characters allowed after the hyphenation point.

Returns

Hyphenation

a Hyphenation object representing the hyphenated word or null if word is not hyphenated.

LoadPatterns(Stream, string)

Read hyphenation patterns from an XML file.

public virtual void LoadPatterns(Stream stream, string name)

Parameters

stream Stream

the InputSource for the file

name string

unique key representing country-language combination

LoadPatterns(string)

Read hyphenation patterns from an XML file.

public virtual void LoadPatterns(string filename)

Parameters

filename string

the filename

PackValues(string)

Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9.

protected virtual int PackValues(string values)

Parameters

values string

a string of digits from '0' to '9' representing the interletter values.

Returns

int

the index into the vspace array where the packed values are stored.

Remarks

Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9. We use zero as terminator, so we'll add 1 to the value.

SearchPatterns(char[], int, byte[])

Search for all possible partial matches of word starting at index an update interletter values.

protected virtual void SearchPatterns(char[] word, int index, byte[] il)

Parameters

word char[]

null terminated word to match

index int

start index from word

il byte[]

interletter values array to update

Remarks

Search for all possible partial matches of word starting at index an update interletter values. In other words, it does something like:

for(i=0; i<patterns.length; i++) { if ( word.substring(index).startsWidth(patterns[i]) ) update_interletter_values(patterns[i]); }

But it is done in an efficient way since the patterns are stored in a ternary tree. In fact, this is the whole purpose of having the tree: doing this search without having to test every single pattern. The number of patterns for languages such as English range from 4000 to 10000. Thus, doing thousands of string comparisons for each word to hyphenate would be really slow without the tree. The tradeoff is memory, but using a ternary tree instead of a trie, almost halves the the memory used by Lout or TeX. It's also faster than using a hash table

UnpackValues(int)

Unpack values.

protected virtual string UnpackValues(int k)

Parameters

k int

an integer

Returns

string

a string