Class HtmlDocument
- Namespace
- HtmlAgilityPack
- Assembly
- HtmlAgilityPack.dll
Represents a complete HTML document.
public class HtmlDocument : IXPathNavigable
- Inheritance
-
HtmlDocument
- Implements
-
IXPathNavigable
- Inherited Members
Constructors
HtmlDocument()
Creates an instance of an HTML document.
public HtmlDocument()
Fields
BackwardCompatibility
True to stay backward compatible with previous version of HAP. This option does not guarantee 100% compatibility.
public bool BackwardCompatibility
Field Value
DisableServerSideCode
True to disable, false to enable the server side code.
public bool DisableServerSideCode
Field Value
GlobalAttributeValueQuote
Defines the global attribute value quote. When specified, it will always win.
public AttributeValueQuote? GlobalAttributeValueQuote
Field Value
OptionAddDebuggingAttributes
Adds Debugging attributes to node. Default is false.
public bool OptionAddDebuggingAttributes
Field Value
OptionAutoCloseOnEnd
Defines if closing for non closed nodes must be done at the end or directly in the document. Setting this to true can actually change how browsers render the page. Default is false.
public bool OptionAutoCloseOnEnd
Field Value
OptionCheckSyntax
Defines if non closed nodes will be checked at the end of parsing. Default is true.
public bool OptionCheckSyntax
Field Value
OptionComputeChecksum
Defines if a checksum must be computed for the document while parsing. Default is false.
public bool OptionComputeChecksum
Field Value
OptionDefaultStreamEncoding
Defines the default stream encoding to use. Default is System.Text.Encoding.Default.
public Encoding OptionDefaultStreamEncoding
Field Value
OptionDefaultUseOriginalName
Defines if attributes should use original names by default, rather than lower case. Default is false.
public bool OptionDefaultUseOriginalName
Field Value
OptionEmptyCollection
Defines if SelectNodes method will return null or empty collection when no node matched the XPath expression. Setting this to true will return empty collection and false will return null. Default is false.
public bool OptionEmptyCollection
Field Value
OptionExtractErrorSourceText
Defines if source text must be extracted while parsing errors. If the document has a lot of errors, or cascading errors, parsing performance can be dramatically affected if set to true. Default is false.
public bool OptionExtractErrorSourceText
Field Value
OptionExtractErrorSourceTextMaxLength
Defines the maximum length of source text or parse errors. Default is 100.
public int OptionExtractErrorSourceTextMaxLength
Field Value
OptionFixNestedTags
Defines if LI, TR, TH, TD tags must be partially fixed when nesting errors are detected. Default is false.
public bool OptionFixNestedTags
Field Value
OptionMaxNestedChildNodes
The max number of nested child nodes. Added to prevent stackoverflow problem when a page has tens of thousands of opening html tags with no closing tags
public int OptionMaxNestedChildNodes
Field Value
OptionOutputAsXml
Defines if output must conform to XML, instead of HTML. Default is false.
public bool OptionOutputAsXml
Field Value
OptionOutputOptimizeAttributeValues
Defines if attribute value output must be optimized (not bound with double quotes if it is possible). Default is false.
public bool OptionOutputOptimizeAttributeValues
Field Value
OptionOutputOriginalCase
Defines if name must be output with it's original case. Useful for asp.net tags and attributes. Default is false.
public bool OptionOutputOriginalCase
Field Value
OptionOutputUpperCase
Defines if name must be output in uppercase. Default is false.
public bool OptionOutputUpperCase
Field Value
OptionPreserveXmlNamespaces
If used together with OptionOutputAsXml and enabled, Xml namespaces in element names are preserved. Default is false.
public bool OptionPreserveXmlNamespaces
Field Value
OptionReadEncoding
Defines if declared encoding must be read from the document. Declared encoding is determined using the meta http-equiv="content-type" content="text/html;charset=XXXXX" html node. Default is true.
public bool OptionReadEncoding
Field Value
OptionStopperNodeName
Defines the name of a node that will throw the StopperNodeException when found as an end node. Default is null.
public string OptionStopperNodeName
Field Value
OptionUseIdAttribute
Defines if the 'id' attribute must be specifically used. Default is true.
public bool OptionUseIdAttribute
Field Value
OptionWriteEmptyNodes
Defines if empty nodes must be written as closed during output. Default is false.
public bool OptionWriteEmptyNodes
Field Value
OptionXmlForceOriginalComment
Force to take the original comment instead of creating it
public bool OptionXmlForceOriginalComment
Field Value
Text
The HtmlDocument Text. Careful if you modify it.
public string Text
Field Value
Properties
CheckSum
Gets the document CRC32 checksum if OptionComputeChecksum was set to true before parsing, 0 otherwise.
public int CheckSum { get; }
Property Value
DeclaredEncoding
Gets the document's declared encoding. Declared encoding is determined using the meta http-equiv="content-type" content="text/html;charset=XXXXX" html node (pre-HTML5) or the meta charset="XXXXX" html node (HTML5).
public Encoding DeclaredEncoding { get; }
Property Value
DefaultBuilder
Default builder to use in the HtmlDocument constructor
public static Action<HtmlDocument> DefaultBuilder { get; set; }
Property Value
DisableBehaviorTagP
True to disable, false to enable the behavior tag p.
public static bool DisableBehaviorTagP { get; set; }
Property Value
DocumentNode
Gets the root node of the document.
public HtmlNode DocumentNode { get; }
Property Value
Encoding
Gets the document's output encoding.
public Encoding Encoding { get; }
Property Value
MaxDepthLevel
Defines the max level we would go deep into the html document. If this depth level is exceeded, and exception is thrown.
public static int MaxDepthLevel { get; set; }
Property Value
ParseErrors
Gets a list of parse errors found in the document.
public IEnumerable<HtmlParseError> ParseErrors { get; }
Property Value
ParseExecuting
Action to execute before the Parse is executed
public Action<HtmlDocument> ParseExecuting { get; set; }
Property Value
ParsedText
Gets the parsed text.
public string ParsedText { get; }
Property Value
- string
The parsed text.
Remainder
Gets the remaining text. Will always be null if OptionStopperNodeName is null.
public string Remainder { get; }
Property Value
RemainderOffset
Gets the offset of Remainder in the original Html text. If OptionStopperNodeName is null, this will return the length of the original Html text.
public int RemainderOffset { get; }
Property Value
StreamEncoding
Gets the document's stream encoding.
public Encoding StreamEncoding { get; }
Property Value
Methods
CreateAttribute(string)
Creates an HTML attribute with the specified name.
public HtmlAttribute CreateAttribute(string name)
Parameters
name
stringThe name of the attribute. May not be null.
Returns
- HtmlAttribute
The new HTML attribute.
CreateAttribute(string, string)
Creates an HTML attribute with the specified name.
public HtmlAttribute CreateAttribute(string name, string value)
Parameters
Returns
- HtmlAttribute
The new HTML attribute.
CreateComment()
Creates an HTML comment node.
public HtmlCommentNode CreateComment()
Returns
- HtmlCommentNode
The new HTML comment node.
CreateComment(string)
Creates an HTML comment node with the specified comment text.
public HtmlCommentNode CreateComment(string comment)
Parameters
comment
stringThe comment text. May not be null.
Returns
- HtmlCommentNode
The new HTML comment node.
CreateElement(string)
Creates an HTML element node with the specified name.
public HtmlNode CreateElement(string name)
Parameters
name
stringThe qualified name of the element. May not be null.
Returns
- HtmlNode
The new HTML node.
CreateNavigator()
Creates a new XPathNavigator object for navigating this HTML document.
public XPathNavigator CreateNavigator()
Returns
- XPathNavigator
An XPathNavigator object. The XPathNavigator is positioned on the root of the document.
CreateTextNode()
Creates an HTML text node.
public HtmlTextNode CreateTextNode()
Returns
- HtmlTextNode
The new HTML text node.
CreateTextNode(string)
Creates an HTML text node with the specified text.
public HtmlTextNode CreateTextNode(string text)
Parameters
text
stringThe text of the node. May not be null.
Returns
- HtmlTextNode
The new HTML text node.
DetectEncoding(Stream)
Detects the encoding of an HTML stream.
public Encoding DetectEncoding(Stream stream)
Parameters
stream
StreamThe input stream. May not be null.
Returns
- Encoding
The detected encoding.
DetectEncoding(Stream, bool)
Detects the encoding of an HTML stream.
public Encoding DetectEncoding(Stream stream, bool checkHtml)
Parameters
Returns
- Encoding
The detected encoding.
DetectEncoding(TextReader)
Detects the encoding of an HTML text provided on a TextReader.
public Encoding DetectEncoding(TextReader reader)
Parameters
reader
TextReaderThe TextReader used to feed the HTML. May not be null.
Returns
- Encoding
The detected encoding.
DetectEncoding(string)
Detects the encoding of an HTML file.
public Encoding DetectEncoding(string path)
Parameters
path
stringPath for the file containing the HTML document to detect. May not be null.
Returns
- Encoding
The detected encoding.
DetectEncodingAndLoad(string)
Detects the encoding of an HTML document from a file first, and then loads the file.
public void DetectEncodingAndLoad(string path)
Parameters
path
stringThe complete file path to be read.
DetectEncodingAndLoad(string, bool)
Detects the encoding of an HTML document from a file first, and then loads the file.
public void DetectEncodingAndLoad(string path, bool detectEncoding)
Parameters
path
stringThe complete file path to be read. May not be null.
detectEncoding
booltrue to detect encoding, false otherwise.
DetectEncodingHtml(string)
Detects the encoding of an HTML text.
public Encoding DetectEncodingHtml(string html)
Parameters
html
stringThe input html text. May not be null.
Returns
- Encoding
The detected encoding.
GetElementbyId(string)
Gets the HTML node with the specified 'id' attribute value.
public HtmlNode GetElementbyId(string id)
Parameters
id
stringThe attribute id to match. May not be null.
Returns
- HtmlNode
The HTML node with the matching id or null if not found.
GetXmlName(string)
Gets a valid XML name.
public static string GetXmlName(string name)
Parameters
name
stringAny text.
Returns
- string
A string that is a valid XML name.
GetXmlName(string, bool, bool)
public static string GetXmlName(string name, bool isAttribute, bool preserveXmlNamespaces)
Parameters
Returns
HtmlEncode(string)
Applies HTML encoding to a specified string.
public static string HtmlEncode(string html)
Parameters
html
stringThe input string to encode. May not be null.
Returns
- string
The encoded string.
IsWhiteSpace(int)
Determines if the specified character is considered as a whitespace character.
public static bool IsWhiteSpace(int c)
Parameters
c
intThe character to check.
Returns
- bool
true if if the specified character is considered as a whitespace character.
Load(Stream)
Loads an HTML document from a stream.
public void Load(Stream stream)
Parameters
stream
StreamThe input stream.
Load(Stream, bool)
Loads an HTML document from a stream.
public void Load(Stream stream, bool detectEncodingFromByteOrderMarks)
Parameters
stream
StreamThe input stream.
detectEncodingFromByteOrderMarks
boolIndicates whether to look for byte order marks at the beginning of the stream.
Load(Stream, Encoding)
Loads an HTML document from a stream.
public void Load(Stream stream, Encoding encoding)
Parameters
Load(Stream, Encoding, bool)
Loads an HTML document from a stream.
public void Load(Stream stream, Encoding encoding, bool detectEncodingFromByteOrderMarks)
Parameters
stream
StreamThe input stream.
encoding
EncodingThe character encoding to use.
detectEncodingFromByteOrderMarks
boolIndicates whether to look for byte order marks at the beginning of the stream.
Load(Stream, Encoding, bool, int)
Loads an HTML document from a stream.
public void Load(Stream stream, Encoding encoding, bool detectEncodingFromByteOrderMarks, int buffersize)
Parameters
stream
StreamThe input stream.
encoding
EncodingThe character encoding to use.
detectEncodingFromByteOrderMarks
boolIndicates whether to look for byte order marks at the beginning of the stream.
buffersize
intThe minimum buffer size.
Load(TextReader)
Loads the HTML document from the specified TextReader.
public void Load(TextReader reader)
Parameters
reader
TextReaderThe TextReader used to feed the HTML data into the document. May not be null.
Load(string)
Loads an HTML document from a file.
public void Load(string path)
Parameters
path
stringThe complete file path to be read. May not be null.
Load(string, bool)
Loads an HTML document from a file.
public void Load(string path, bool detectEncodingFromByteOrderMarks)
Parameters
path
stringThe complete file path to be read. May not be null.
detectEncodingFromByteOrderMarks
boolIndicates whether to look for byte order marks at the beginning of the file.
Load(string, Encoding)
Loads an HTML document from a file.
public void Load(string path, Encoding encoding)
Parameters
path
stringThe complete file path to be read. May not be null.
encoding
EncodingThe character encoding to use. May not be null.
Load(string, Encoding, bool)
Loads an HTML document from a file.
public void Load(string path, Encoding encoding, bool detectEncodingFromByteOrderMarks)
Parameters
path
stringThe complete file path to be read. May not be null.
encoding
EncodingThe character encoding to use. May not be null.
detectEncodingFromByteOrderMarks
boolIndicates whether to look for byte order marks at the beginning of the file.
Load(string, Encoding, bool, int)
Loads an HTML document from a file.
public void Load(string path, Encoding encoding, bool detectEncodingFromByteOrderMarks, int buffersize)
Parameters
path
stringThe complete file path to be read. May not be null.
encoding
EncodingThe character encoding to use. May not be null.
detectEncodingFromByteOrderMarks
boolIndicates whether to look for byte order marks at the beginning of the file.
buffersize
intThe minimum buffer size.
LoadHtml(string)
Loads the HTML document from the specified string.
public void LoadHtml(string html)
Parameters
html
stringString containing the HTML document to load. May not be null.
Save(Stream)
Saves the HTML document to the specified stream.
public void Save(Stream outStream)
Parameters
outStream
StreamThe stream to which you want to save.
Save(Stream, Encoding)
Saves the HTML document to the specified stream.
public void Save(Stream outStream, Encoding encoding)
Parameters
outStream
StreamThe stream to which you want to save. May not be null.
encoding
EncodingThe character encoding to use. May not be null.
Save(StreamWriter)
Saves the HTML document to the specified StreamWriter.
public void Save(StreamWriter writer)
Parameters
writer
StreamWriterThe StreamWriter to which you want to save.
Save(TextWriter)
Saves the HTML document to the specified TextWriter.
public void Save(TextWriter writer)
Parameters
writer
TextWriterThe TextWriter to which you want to save. May not be null.
Save(string)
Saves the mixed document to the specified file.
public void Save(string filename)
Parameters
filename
stringThe location of the file where you want to save the document.
Save(string, Encoding)
Saves the mixed document to the specified file.
public void Save(string filename, Encoding encoding)
Parameters
filename
stringThe location of the file where you want to save the document. May not be null.
encoding
EncodingThe character encoding to use. May not be null.
Save(XmlWriter)
Saves the HTML document to the specified XmlWriter.
public void Save(XmlWriter writer)
Parameters
writer
XmlWriterThe XmlWriter to which you want to save.
UseAttributeOriginalName(string)
public void UseAttributeOriginalName(string tagName)
Parameters
tagName
string