Class TaggedPdfReaderTool
Converts a tagged PDF document into an XML file.
public class TaggedPdfReaderTool
- Inheritance
-
TaggedPdfReaderTool
- Inherited Members
Constructors
TaggedPdfReaderTool(PdfDocument)
Constructs a TaggedPdfReaderTool via a given PdfDocument.
public TaggedPdfReaderTool(PdfDocument document)
Parameters
document
PdfDocumentthe document to read tag structure from
Fields
document
protected PdfDocument document
Field Value
out
protected StreamWriter @out
Field Value
parsedTags
protected IDictionary<PdfDictionary, IDictionary<int, string>> parsedTags
Field Value
rootTag
protected string rootTag
Field Value
Methods
ConvertToXml(Stream)
Converts the current tag structure into an XML file with default encoding (UTF-8).
public virtual void ConvertToXml(Stream os)
Parameters
os
Streamthe output stream to save XML file to
ConvertToXml(Stream, string)
Converts the current tag structure into an XML file with provided encoding.
public virtual void ConvertToXml(Stream os, string charset)
Parameters
os
Streamthe output stream to save XML file to
charset
stringthe charset of the resultant XML file
EscapeXML(string, bool)
NOTE: copied from itext5 XMLUtils class Escapes a string with the appropriated XML codes.
protected static string EscapeXML(string s, bool onlyASCII)
Parameters
s
stringthe string to be escaped
onlyASCII
boolcodes above 127 will always be escaped with &#nn; if
true
Returns
- string
the escaped string
FixTagName(string)
Fixes specified tag name to be valid XML tag.
protected static string FixTagName(string tag)
Parameters
tag
stringtag name to fix
Returns
- string
fixed tag name.
InspectAttributes(PdfStructElem)
Inspects attributes dictionary of the StructTreeRoot child.
protected virtual void InspectAttributes(PdfStructElem kid)
Parameters
kid
PdfStructElemthe direct kid of the StructTreeRoot
InspectKid(IStructureNode)
Inspect the child of the StructTreeRoot.
protected virtual void InspectKid(IStructureNode kid)
Parameters
kid
IStructureNodethe direct kid of the StructTreeRoot
InspectKids(IList<IStructureNode>)
Inspect the children of the StructTreeRoot.
protected virtual void InspectKids(IList<IStructureNode> kids)
Parameters
kids
IList<IStructureNode>list of the direct kids of the StructTreeRoot
IsValidCharacterValue(int)
Checks if a character value should be escaped/unescaped.
public static bool IsValidCharacterValue(int c)
Parameters
c
inta character value
Returns
- bool
true if it's OK to escape or unescape this value.
ParseTag(PdfMcr)
Parses tag of the Marked Content Reference (MCR) kid of the StructTreeRoot.
protected virtual void ParseTag(PdfMcr kid)
Parameters
SetRootTag(string)
Sets the name of the root tag of the resultant XML file
public virtual TaggedPdfReaderTool SetRootTag(string rootTagName)
Parameters
rootTagName
stringthe name of the root tag
Returns
- TaggedPdfReaderTool
this object