Table of Contents

Class HtmlDocument

Namespace
HtmlAgilityPack
Assembly
HtmlAgilityPack.dll

Represents a complete HTML document.

public class HtmlDocument : IXPathNavigable
Inheritance
HtmlDocument
Implements
IXPathNavigable
Inherited Members

Constructors

HtmlDocument()

Creates an instance of an HTML document.

public HtmlDocument()

Fields

BackwardCompatibility

True to stay backward compatible with previous version of HAP. This option does not guarantee 100% compatibility.

public bool BackwardCompatibility

Field Value

bool

DisableServerSideCode

True to disable, false to enable the server side code.

public bool DisableServerSideCode

Field Value

bool

GlobalAttributeValueQuote

Defines the global attribute value quote. When specified, it will always win.

public AttributeValueQuote? GlobalAttributeValueQuote

Field Value

AttributeValueQuote?

OptionAddDebuggingAttributes

Adds Debugging attributes to node. Default is false.

public bool OptionAddDebuggingAttributes

Field Value

bool

OptionAutoCloseOnEnd

Defines if closing for non closed nodes must be done at the end or directly in the document. Setting this to true can actually change how browsers render the page. Default is false.

public bool OptionAutoCloseOnEnd

Field Value

bool

OptionCheckSyntax

Defines if non closed nodes will be checked at the end of parsing. Default is true.

public bool OptionCheckSyntax

Field Value

bool

OptionComputeChecksum

Defines if a checksum must be computed for the document while parsing. Default is false.

public bool OptionComputeChecksum

Field Value

bool

OptionDefaultStreamEncoding

Defines the default stream encoding to use. Default is System.Text.Encoding.Default.

public Encoding OptionDefaultStreamEncoding

Field Value

Encoding

OptionDefaultUseOriginalName

Defines if attributes should use original names by default, rather than lower case. Default is false.

public bool OptionDefaultUseOriginalName

Field Value

bool

OptionEmptyCollection

Defines if SelectNodes method will return null or empty collection when no node matched the XPath expression. Setting this to true will return empty collection and false will return null. Default is false.

public bool OptionEmptyCollection

Field Value

bool

OptionExtractErrorSourceText

Defines if source text must be extracted while parsing errors. If the document has a lot of errors, or cascading errors, parsing performance can be dramatically affected if set to true. Default is false.

public bool OptionExtractErrorSourceText

Field Value

bool

OptionExtractErrorSourceTextMaxLength

Defines the maximum length of source text or parse errors. Default is 100.

public int OptionExtractErrorSourceTextMaxLength

Field Value

int

OptionFixNestedTags

Defines if LI, TR, TH, TD tags must be partially fixed when nesting errors are detected. Default is false.

public bool OptionFixNestedTags

Field Value

bool

OptionMaxNestedChildNodes

The max number of nested child nodes. Added to prevent stackoverflow problem when a page has tens of thousands of opening html tags with no closing tags

public int OptionMaxNestedChildNodes

Field Value

int

OptionOutputAsXml

Defines if output must conform to XML, instead of HTML. Default is false.

public bool OptionOutputAsXml

Field Value

bool

OptionOutputOptimizeAttributeValues

Defines if attribute value output must be optimized (not bound with double quotes if it is possible). Default is false.

public bool OptionOutputOptimizeAttributeValues

Field Value

bool

OptionOutputOriginalCase

Defines if name must be output with it's original case. Useful for asp.net tags and attributes. Default is false.

public bool OptionOutputOriginalCase

Field Value

bool

OptionOutputUpperCase

Defines if name must be output in uppercase. Default is false.

public bool OptionOutputUpperCase

Field Value

bool

OptionPreserveXmlNamespaces

If used together with OptionOutputAsXml and enabled, Xml namespaces in element names are preserved. Default is false.

public bool OptionPreserveXmlNamespaces

Field Value

bool

OptionReadEncoding

Defines if declared encoding must be read from the document. Declared encoding is determined using the meta http-equiv="content-type" content="text/html;charset=XXXXX" html node. Default is true.

public bool OptionReadEncoding

Field Value

bool

OptionStopperNodeName

Defines the name of a node that will throw the StopperNodeException when found as an end node. Default is null.

public string OptionStopperNodeName

Field Value

string

OptionUseIdAttribute

Defines if the 'id' attribute must be specifically used. Default is true.

public bool OptionUseIdAttribute

Field Value

bool

OptionWriteEmptyNodes

Defines if empty nodes must be written as closed during output. Default is false.

public bool OptionWriteEmptyNodes

Field Value

bool

OptionXmlForceOriginalComment

Force to take the original comment instead of creating it

public bool OptionXmlForceOriginalComment

Field Value

bool

Text

The HtmlDocument Text. Careful if you modify it.

public string Text

Field Value

string

Properties

CheckSum

Gets the document CRC32 checksum if OptionComputeChecksum was set to true before parsing, 0 otherwise.

public int CheckSum { get; }

Property Value

int

DeclaredEncoding

Gets the document's declared encoding. Declared encoding is determined using the meta http-equiv="content-type" content="text/html;charset=XXXXX" html node (pre-HTML5) or the meta charset="XXXXX" html node (HTML5).

public Encoding DeclaredEncoding { get; }

Property Value

Encoding

DefaultBuilder

Default builder to use in the HtmlDocument constructor

public static Action<HtmlDocument> DefaultBuilder { get; set; }

Property Value

Action<HtmlDocument>

DisableBehaviorTagP

True to disable, false to enable the behavior tag p.

public static bool DisableBehaviorTagP { get; set; }

Property Value

bool

DocumentNode

Gets the root node of the document.

public HtmlNode DocumentNode { get; }

Property Value

HtmlNode

Encoding

Gets the document's output encoding.

public Encoding Encoding { get; }

Property Value

Encoding

MaxDepthLevel

Defines the max level we would go deep into the html document. If this depth level is exceeded, and exception is thrown.

public static int MaxDepthLevel { get; set; }

Property Value

int

ParseErrors

Gets a list of parse errors found in the document.

public IEnumerable<HtmlParseError> ParseErrors { get; }

Property Value

IEnumerable<HtmlParseError>

ParseExecuting

Action to execute before the Parse is executed

public Action<HtmlDocument> ParseExecuting { get; set; }

Property Value

Action<HtmlDocument>

ParsedText

Gets the parsed text.

public string ParsedText { get; }

Property Value

string

The parsed text.

Remainder

Gets the remaining text. Will always be null if OptionStopperNodeName is null.

public string Remainder { get; }

Property Value

string

RemainderOffset

Gets the offset of Remainder in the original Html text. If OptionStopperNodeName is null, this will return the length of the original Html text.

public int RemainderOffset { get; }

Property Value

int

StreamEncoding

Gets the document's stream encoding.

public Encoding StreamEncoding { get; }

Property Value

Encoding

Methods

CreateAttribute(string)

Creates an HTML attribute with the specified name.

public HtmlAttribute CreateAttribute(string name)

Parameters

name string

The name of the attribute. May not be null.

Returns

HtmlAttribute

The new HTML attribute.

CreateAttribute(string, string)

Creates an HTML attribute with the specified name.

public HtmlAttribute CreateAttribute(string name, string value)

Parameters

name string

The name of the attribute. May not be null.

value string

The value of the attribute.

Returns

HtmlAttribute

The new HTML attribute.

CreateComment()

Creates an HTML comment node.

public HtmlCommentNode CreateComment()

Returns

HtmlCommentNode

The new HTML comment node.

CreateComment(string)

Creates an HTML comment node with the specified comment text.

public HtmlCommentNode CreateComment(string comment)

Parameters

comment string

The comment text. May not be null.

Returns

HtmlCommentNode

The new HTML comment node.

CreateElement(string)

Creates an HTML element node with the specified name.

public HtmlNode CreateElement(string name)

Parameters

name string

The qualified name of the element. May not be null.

Returns

HtmlNode

The new HTML node.

CreateNavigator()

Creates a new XPathNavigator object for navigating this HTML document.

public XPathNavigator CreateNavigator()

Returns

XPathNavigator

An XPathNavigator object. The XPathNavigator is positioned on the root of the document.

CreateTextNode()

Creates an HTML text node.

public HtmlTextNode CreateTextNode()

Returns

HtmlTextNode

The new HTML text node.

CreateTextNode(string)

Creates an HTML text node with the specified text.

public HtmlTextNode CreateTextNode(string text)

Parameters

text string

The text of the node. May not be null.

Returns

HtmlTextNode

The new HTML text node.

DetectEncoding(Stream)

Detects the encoding of an HTML stream.

public Encoding DetectEncoding(Stream stream)

Parameters

stream Stream

The input stream. May not be null.

Returns

Encoding

The detected encoding.

DetectEncoding(Stream, bool)

Detects the encoding of an HTML stream.

public Encoding DetectEncoding(Stream stream, bool checkHtml)

Parameters

stream Stream

The input stream. May not be null.

checkHtml bool

The html is checked.

Returns

Encoding

The detected encoding.

DetectEncoding(TextReader)

Detects the encoding of an HTML text provided on a TextReader.

public Encoding DetectEncoding(TextReader reader)

Parameters

reader TextReader

The TextReader used to feed the HTML. May not be null.

Returns

Encoding

The detected encoding.

DetectEncoding(string)

Detects the encoding of an HTML file.

public Encoding DetectEncoding(string path)

Parameters

path string

Path for the file containing the HTML document to detect. May not be null.

Returns

Encoding

The detected encoding.

DetectEncodingAndLoad(string)

Detects the encoding of an HTML document from a file first, and then loads the file.

public void DetectEncodingAndLoad(string path)

Parameters

path string

The complete file path to be read.

DetectEncodingAndLoad(string, bool)

Detects the encoding of an HTML document from a file first, and then loads the file.

public void DetectEncodingAndLoad(string path, bool detectEncoding)

Parameters

path string

The complete file path to be read. May not be null.

detectEncoding bool

true to detect encoding, false otherwise.

DetectEncodingHtml(string)

Detects the encoding of an HTML text.

public Encoding DetectEncodingHtml(string html)

Parameters

html string

The input html text. May not be null.

Returns

Encoding

The detected encoding.

GetElementbyId(string)

Gets the HTML node with the specified 'id' attribute value.

public HtmlNode GetElementbyId(string id)

Parameters

id string

The attribute id to match. May not be null.

Returns

HtmlNode

The HTML node with the matching id or null if not found.

GetXmlName(string)

Gets a valid XML name.

public static string GetXmlName(string name)

Parameters

name string

Any text.

Returns

string

A string that is a valid XML name.

GetXmlName(string, bool, bool)

public static string GetXmlName(string name, bool isAttribute, bool preserveXmlNamespaces)

Parameters

name string
isAttribute bool
preserveXmlNamespaces bool

Returns

string

HtmlEncode(string)

Applies HTML encoding to a specified string.

public static string HtmlEncode(string html)

Parameters

html string

The input string to encode. May not be null.

Returns

string

The encoded string.

IsWhiteSpace(int)

Determines if the specified character is considered as a whitespace character.

public static bool IsWhiteSpace(int c)

Parameters

c int

The character to check.

Returns

bool

true if if the specified character is considered as a whitespace character.

Load(Stream)

Loads an HTML document from a stream.

public void Load(Stream stream)

Parameters

stream Stream

The input stream.

Load(Stream, bool)

Loads an HTML document from a stream.

public void Load(Stream stream, bool detectEncodingFromByteOrderMarks)

Parameters

stream Stream

The input stream.

detectEncodingFromByteOrderMarks bool

Indicates whether to look for byte order marks at the beginning of the stream.

Load(Stream, Encoding)

Loads an HTML document from a stream.

public void Load(Stream stream, Encoding encoding)

Parameters

stream Stream

The input stream.

encoding Encoding

The character encoding to use.

Load(Stream, Encoding, bool)

Loads an HTML document from a stream.

public void Load(Stream stream, Encoding encoding, bool detectEncodingFromByteOrderMarks)

Parameters

stream Stream

The input stream.

encoding Encoding

The character encoding to use.

detectEncodingFromByteOrderMarks bool

Indicates whether to look for byte order marks at the beginning of the stream.

Load(Stream, Encoding, bool, int)

Loads an HTML document from a stream.

public void Load(Stream stream, Encoding encoding, bool detectEncodingFromByteOrderMarks, int buffersize)

Parameters

stream Stream

The input stream.

encoding Encoding

The character encoding to use.

detectEncodingFromByteOrderMarks bool

Indicates whether to look for byte order marks at the beginning of the stream.

buffersize int

The minimum buffer size.

Load(TextReader)

Loads the HTML document from the specified TextReader.

public void Load(TextReader reader)

Parameters

reader TextReader

The TextReader used to feed the HTML data into the document. May not be null.

Load(string)

Loads an HTML document from a file.

public void Load(string path)

Parameters

path string

The complete file path to be read. May not be null.

Load(string, bool)

Loads an HTML document from a file.

public void Load(string path, bool detectEncodingFromByteOrderMarks)

Parameters

path string

The complete file path to be read. May not be null.

detectEncodingFromByteOrderMarks bool

Indicates whether to look for byte order marks at the beginning of the file.

Load(string, Encoding)

Loads an HTML document from a file.

public void Load(string path, Encoding encoding)

Parameters

path string

The complete file path to be read. May not be null.

encoding Encoding

The character encoding to use. May not be null.

Load(string, Encoding, bool)

Loads an HTML document from a file.

public void Load(string path, Encoding encoding, bool detectEncodingFromByteOrderMarks)

Parameters

path string

The complete file path to be read. May not be null.

encoding Encoding

The character encoding to use. May not be null.

detectEncodingFromByteOrderMarks bool

Indicates whether to look for byte order marks at the beginning of the file.

Load(string, Encoding, bool, int)

Loads an HTML document from a file.

public void Load(string path, Encoding encoding, bool detectEncodingFromByteOrderMarks, int buffersize)

Parameters

path string

The complete file path to be read. May not be null.

encoding Encoding

The character encoding to use. May not be null.

detectEncodingFromByteOrderMarks bool

Indicates whether to look for byte order marks at the beginning of the file.

buffersize int

The minimum buffer size.

LoadHtml(string)

Loads the HTML document from the specified string.

public void LoadHtml(string html)

Parameters

html string

String containing the HTML document to load. May not be null.

Save(Stream)

Saves the HTML document to the specified stream.

public void Save(Stream outStream)

Parameters

outStream Stream

The stream to which you want to save.

Save(Stream, Encoding)

Saves the HTML document to the specified stream.

public void Save(Stream outStream, Encoding encoding)

Parameters

outStream Stream

The stream to which you want to save. May not be null.

encoding Encoding

The character encoding to use. May not be null.

Save(StreamWriter)

Saves the HTML document to the specified StreamWriter.

public void Save(StreamWriter writer)

Parameters

writer StreamWriter

The StreamWriter to which you want to save.

Save(TextWriter)

Saves the HTML document to the specified TextWriter.

public void Save(TextWriter writer)

Parameters

writer TextWriter

The TextWriter to which you want to save. May not be null.

Save(string)

Saves the mixed document to the specified file.

public void Save(string filename)

Parameters

filename string

The location of the file where you want to save the document.

Save(string, Encoding)

Saves the mixed document to the specified file.

public void Save(string filename, Encoding encoding)

Parameters

filename string

The location of the file where you want to save the document. May not be null.

encoding Encoding

The character encoding to use. May not be null.

Save(XmlWriter)

Saves the HTML document to the specified XmlWriter.

public void Save(XmlWriter writer)

Parameters

writer XmlWriter

The XmlWriter to which you want to save.

UseAttributeOriginalName(string)

public void UseAttributeOriginalName(string tagName)

Parameters

tagName string