Class PdfTextExtractor
public sealed class PdfTextExtractor
- Inheritance
-
PdfTextExtractor
- Inherited Members
Methods
GetTextFromPage(PdfPage)
Extract text from a specified page using the default strategy.
public static string GetTextFromPage(PdfPage page)
Parameters
page
PdfPagethe page for the text to be extracted from
Returns
- string
the extracted text
Remarks
Extract text from a specified page using the default strategy. Node: the default strategy is subject to change. If using a specific strategy is important, please use GetTextFromPage(PdfPage, ITextExtractionStrategy).
GetTextFromPage(PdfPage, ITextExtractionStrategy)
Extract text from a specified page using an extraction strategy.
public static string GetTextFromPage(PdfPage page, ITextExtractionStrategy strategy)
Parameters
page
PdfPagethe page for the text to be extracted from
strategy
ITextExtractionStrategythe strategy to use for extracting text
Returns
- string
the extracted text
Remarks
Extract text from a specified page using an extraction strategy. Extraction strategy must be passed as a new object for every single page.
GetTextFromPage(PdfPage, ITextExtractionStrategy, IDictionary<string, IContentOperator>)
Extract text from a specified page using an extraction strategy.
public static string GetTextFromPage(PdfPage page, ITextExtractionStrategy strategy, IDictionary<string, IContentOperator> additionalContentOperators)
Parameters
page
PdfPagethe page for the text to be extracted from
strategy
ITextExtractionStrategythe strategy to use for extracting text
additionalContentOperators
IDictionary<string, IContentOperator>an optional map of custom IContentOperator s for rendering instructions
Returns
- string
the extracted text
Remarks
Extract text from a specified page using an extraction strategy. Also allows registration of custom IContentOperators that can influence how (and whether or not) the PDF instructions will be parsed. Extraction strategy must be passed as a new object for every single page.