Table of Contents

Class PdfTextExtractor

Namespace
iText.Kernel.Pdf.Canvas.Parser
Assembly
itext.kernel.dll
public sealed class PdfTextExtractor
Inheritance
PdfTextExtractor
Inherited Members

Methods

GetTextFromPage(PdfPage)

Extract text from a specified page using the default strategy.

public static string GetTextFromPage(PdfPage page)

Parameters

page PdfPage

the page for the text to be extracted from

Returns

string

the extracted text

Remarks

Extract text from a specified page using the default strategy. Node: the default strategy is subject to change. If using a specific strategy is important, please use GetTextFromPage(PdfPage, ITextExtractionStrategy).

GetTextFromPage(PdfPage, ITextExtractionStrategy)

Extract text from a specified page using an extraction strategy.

public static string GetTextFromPage(PdfPage page, ITextExtractionStrategy strategy)

Parameters

page PdfPage

the page for the text to be extracted from

strategy ITextExtractionStrategy

the strategy to use for extracting text

Returns

string

the extracted text

Remarks

Extract text from a specified page using an extraction strategy. Extraction strategy must be passed as a new object for every single page.

GetTextFromPage(PdfPage, ITextExtractionStrategy, IDictionary<string, IContentOperator>)

Extract text from a specified page using an extraction strategy.

public static string GetTextFromPage(PdfPage page, ITextExtractionStrategy strategy, IDictionary<string, IContentOperator> additionalContentOperators)

Parameters

page PdfPage

the page for the text to be extracted from

strategy ITextExtractionStrategy

the strategy to use for extracting text

additionalContentOperators IDictionary<string, IContentOperator>

an optional map of custom IContentOperator s for rendering instructions

Returns

string

the extracted text

Remarks

Extract text from a specified page using an extraction strategy. Also allows registration of custom IContentOperators that can influence how (and whether or not) the PDF instructions will be parsed. Extraction strategy must be passed as a new object for every single page.