Class HtmlDocument
- Namespace
- HtmlAgilityPack
- Assembly
- HtmlAgilityPack.dll
Represents a complete HTML document.
public class HtmlDocument : IXPathNavigable- Inheritance
- 
      
      HtmlDocument
- Implements
- 
      IXPathNavigable
- Inherited Members
Constructors
HtmlDocument()
Creates an instance of an HTML document.
public HtmlDocument()Fields
OptionAddDebuggingAttributes
Adds Debugging attributes to node. Default is false.
public bool OptionAddDebuggingAttributesField Value
OptionAutoCloseOnEnd
Defines if closing for non closed nodes must be done at the end or directly in the document. Setting this to true can actually change how browsers render the page. Default is false.
public bool OptionAutoCloseOnEndField Value
OptionCheckSyntax
Defines if non closed nodes will be checked at the end of parsing. Default is true.
public bool OptionCheckSyntaxField Value
OptionComputeChecksum
Defines if a checksum must be computed for the document while parsing. Default is false.
public bool OptionComputeChecksumField Value
OptionDefaultStreamEncoding
Defines the default stream encoding to use. Default is System.Text.Encoding.Default.
public Encoding OptionDefaultStreamEncodingField Value
OptionExtractErrorSourceText
Defines if source text must be extracted while parsing errors. If the document has a lot of errors, or cascading errors, parsing performance can be dramatically affected if set to true. Default is false.
public bool OptionExtractErrorSourceTextField Value
OptionExtractErrorSourceTextMaxLength
Defines the maximum length of source text or parse errors. Default is 100.
public int OptionExtractErrorSourceTextMaxLengthField Value
OptionFixNestedTags
Defines if LI, TR, TH, TD tags must be partially fixed when nesting errors are detected. Default is false.
public bool OptionFixNestedTagsField Value
OptionOutputAsXml
Defines if output must conform to XML, instead of HTML.
public bool OptionOutputAsXmlField Value
OptionOutputOptimizeAttributeValues
Defines if attribute value output must be optimized (not bound with double quotes if it is possible). Default is false.
public bool OptionOutputOptimizeAttributeValuesField Value
OptionOutputOriginalCase
Defines if name must be output with it's original case. Useful for asp.net tags and attributes
public bool OptionOutputOriginalCaseField Value
OptionOutputUpperCase
Defines if name must be output in uppercase. Default is false.
public bool OptionOutputUpperCaseField Value
OptionReadEncoding
Defines if declared encoding must be read from the document. Declared encoding is determined using the meta http-equiv="content-type" content="text/html;charset=XXXXX" html node. Default is true.
public bool OptionReadEncodingField Value
OptionStopperNodeName
Defines the name of a node that will throw the StopperNodeException when found as an end node. Default is null.
public string OptionStopperNodeNameField Value
OptionUseIdAttribute
Defines if the 'id' attribute must be specifically used. Default is true.
public bool OptionUseIdAttributeField Value
OptionWriteEmptyNodes
Defines if empty nodes must be written as closed during output. Default is false.
public bool OptionWriteEmptyNodesField Value
Properties
CheckSum
Gets the document CRC32 checksum if OptionComputeChecksum was set to true before parsing, 0 otherwise.
public int CheckSum { get; }Property Value
DeclaredEncoding
Gets the document's declared encoding. Declared encoding is determined using the meta http-equiv="content-type" content="text/html;charset=XXXXX" html node.
public Encoding DeclaredEncoding { get; }Property Value
DocumentNode
Gets the root node of the document.
public HtmlNode DocumentNode { get; }Property Value
Encoding
Gets the document's output encoding.
public Encoding Encoding { get; }Property Value
ParseErrors
Gets a list of parse errors found in the document.
public IEnumerable<HtmlParseError> ParseErrors { get; }Property Value
Remainder
Gets the remaining text. Will always be null if OptionStopperNodeName is null.
public string Remainder { get; }Property Value
RemainderOffset
Gets the offset of Remainder in the original Html text. If OptionStopperNodeName is null, this will return the length of the original Html text.
public int RemainderOffset { get; }Property Value
StreamEncoding
Gets the document's stream encoding.
public Encoding StreamEncoding { get; }Property Value
Methods
CreateAttribute(string)
Creates an HTML attribute with the specified name.
public HtmlAttribute CreateAttribute(string name)Parameters
- namestring
- The name of the attribute. May not be null. 
Returns
- HtmlAttribute
- The new HTML attribute. 
CreateAttribute(string, string)
Creates an HTML attribute with the specified name.
public HtmlAttribute CreateAttribute(string name, string value)Parameters
Returns
- HtmlAttribute
- The new HTML attribute. 
CreateComment()
Creates an HTML comment node.
public HtmlCommentNode CreateComment()Returns
- HtmlCommentNode
- The new HTML comment node. 
CreateComment(string)
Creates an HTML comment node with the specified comment text.
public HtmlCommentNode CreateComment(string comment)Parameters
- commentstring
- The comment text. May not be null. 
Returns
- HtmlCommentNode
- The new HTML comment node. 
CreateElement(string)
Creates an HTML element node with the specified name.
public HtmlNode CreateElement(string name)Parameters
- namestring
- The qualified name of the element. May not be null. 
Returns
- HtmlNode
- The new HTML node. 
CreateNavigator()
Creates a new XPathNavigator object for navigating this HTML document.
public XPathNavigator CreateNavigator()Returns
- XPathNavigator
- An XPathNavigator object. The XPathNavigator is positioned on the root of the document. 
CreateTextNode()
Creates an HTML text node.
public HtmlTextNode CreateTextNode()Returns
- HtmlTextNode
- The new HTML text node. 
CreateTextNode(string)
Creates an HTML text node with the specified text.
public HtmlTextNode CreateTextNode(string text)Parameters
- textstring
- The text of the node. May not be null. 
Returns
- HtmlTextNode
- The new HTML text node. 
DetectEncoding(Stream)
Detects the encoding of an HTML stream.
public Encoding DetectEncoding(Stream stream)Parameters
- streamStream
- The input stream. May not be null. 
Returns
- Encoding
- The detected encoding. 
DetectEncoding(TextReader)
Detects the encoding of an HTML text provided on a TextReader.
public Encoding DetectEncoding(TextReader reader)Parameters
- readerTextReader
- The TextReader used to feed the HTML. May not be null. 
Returns
- Encoding
- The detected encoding. 
DetectEncoding(string)
Detects the encoding of an HTML file.
public Encoding DetectEncoding(string path)Parameters
- pathstring
- Path for the file containing the HTML document to detect. May not be null. 
Returns
- Encoding
- The detected encoding. 
DetectEncodingAndLoad(string)
Detects the encoding of an HTML document from a file first, and then loads the file.
public void DetectEncodingAndLoad(string path)Parameters
- pathstring
- The complete file path to be read. 
DetectEncodingAndLoad(string, bool)
Detects the encoding of an HTML document from a file first, and then loads the file.
public void DetectEncodingAndLoad(string path, bool detectEncoding)Parameters
- pathstring
- The complete file path to be read. May not be null. 
- detectEncodingbool
- true to detect encoding, false otherwise. 
DetectEncodingHtml(string)
Detects the encoding of an HTML text.
public Encoding DetectEncodingHtml(string html)Parameters
- htmlstring
- The input html text. May not be null. 
Returns
- Encoding
- The detected encoding. 
GetElementbyId(string)
Gets the HTML node with the specified 'id' attribute value.
public HtmlNode GetElementbyId(string id)Parameters
- idstring
- The attribute id to match. May not be null. 
Returns
- HtmlNode
- The HTML node with the matching id or null if not found. 
GetXmlName(string)
Gets a valid XML name.
public static string GetXmlName(string name)Parameters
- namestring
- Any text. 
Returns
- string
- A string that is a valid XML name. 
HtmlEncode(string)
Applies HTML encoding to a specified string.
public static string HtmlEncode(string html)Parameters
- htmlstring
- The input string to encode. May not be null. 
Returns
- string
- The encoded string. 
IsWhiteSpace(int)
Determines if the specified character is considered as a whitespace character.
public static bool IsWhiteSpace(int c)Parameters
- cint
- The character to check. 
Returns
- bool
- true if if the specified character is considered as a whitespace character. 
Load(Stream)
Loads an HTML document from a stream.
public void Load(Stream stream)Parameters
- streamStream
- The input stream. 
Load(Stream, bool)
Loads an HTML document from a stream.
public void Load(Stream stream, bool detectEncodingFromByteOrderMarks)Parameters
- streamStream
- The input stream. 
- detectEncodingFromByteOrderMarksbool
- Indicates whether to look for byte order marks at the beginning of the stream. 
Load(Stream, Encoding)
Loads an HTML document from a stream.
public void Load(Stream stream, Encoding encoding)Parameters
Load(Stream, Encoding, bool)
Loads an HTML document from a stream.
public void Load(Stream stream, Encoding encoding, bool detectEncodingFromByteOrderMarks)Parameters
- streamStream
- The input stream. 
- encodingEncoding
- The character encoding to use. 
- detectEncodingFromByteOrderMarksbool
- Indicates whether to look for byte order marks at the beginning of the stream. 
Load(Stream, Encoding, bool, int)
Loads an HTML document from a stream.
public void Load(Stream stream, Encoding encoding, bool detectEncodingFromByteOrderMarks, int buffersize)Parameters
- streamStream
- The input stream. 
- encodingEncoding
- The character encoding to use. 
- detectEncodingFromByteOrderMarksbool
- Indicates whether to look for byte order marks at the beginning of the stream. 
- buffersizeint
- The minimum buffer size. 
Load(TextReader)
Loads the HTML document from the specified TextReader.
public void Load(TextReader reader)Parameters
- readerTextReader
- The TextReader used to feed the HTML data into the document. May not be null. 
Load(string)
Loads an HTML document from a file.
public void Load(string path)Parameters
- pathstring
- The complete file path to be read. May not be null. 
Load(string, bool)
Loads an HTML document from a file.
public void Load(string path, bool detectEncodingFromByteOrderMarks)Parameters
- pathstring
- The complete file path to be read. May not be null. 
- detectEncodingFromByteOrderMarksbool
- Indicates whether to look for byte order marks at the beginning of the file. 
Load(string, Encoding)
Loads an HTML document from a file.
public void Load(string path, Encoding encoding)Parameters
- pathstring
- The complete file path to be read. May not be null. 
- encodingEncoding
- The character encoding to use. May not be null. 
Load(string, Encoding, bool)
Loads an HTML document from a file.
public void Load(string path, Encoding encoding, bool detectEncodingFromByteOrderMarks)Parameters
- pathstring
- The complete file path to be read. May not be null. 
- encodingEncoding
- The character encoding to use. May not be null. 
- detectEncodingFromByteOrderMarksbool
- Indicates whether to look for byte order marks at the beginning of the file. 
Load(string, Encoding, bool, int)
Loads an HTML document from a file.
public void Load(string path, Encoding encoding, bool detectEncodingFromByteOrderMarks, int buffersize)Parameters
- pathstring
- The complete file path to be read. May not be null. 
- encodingEncoding
- The character encoding to use. May not be null. 
- detectEncodingFromByteOrderMarksbool
- Indicates whether to look for byte order marks at the beginning of the file. 
- buffersizeint
- The minimum buffer size. 
LoadHtml(string)
Loads the HTML document from the specified string.
public void LoadHtml(string html)Parameters
- htmlstring
- String containing the HTML document to load. May not be null. 
Save(Stream)
Saves the HTML document to the specified stream.
public void Save(Stream outStream)Parameters
- outStreamStream
- The stream to which you want to save. 
Save(Stream, Encoding)
Saves the HTML document to the specified stream.
public void Save(Stream outStream, Encoding encoding)Parameters
- outStreamStream
- The stream to which you want to save. May not be null. 
- encodingEncoding
- The character encoding to use. May not be null. 
Save(StreamWriter)
Saves the HTML document to the specified StreamWriter.
public void Save(StreamWriter writer)Parameters
- writerStreamWriter
- The StreamWriter to which you want to save. 
Save(TextWriter)
Saves the HTML document to the specified TextWriter.
public void Save(TextWriter writer)Parameters
- writerTextWriter
- The TextWriter to which you want to save. May not be null. 
Save(string)
Saves the mixed document to the specified file.
public void Save(string filename)Parameters
- filenamestring
- The location of the file where you want to save the document. 
Save(string, Encoding)
Saves the mixed document to the specified file.
public void Save(string filename, Encoding encoding)Parameters
- filenamestring
- The location of the file where you want to save the document. May not be null. 
- encodingEncoding
- The character encoding to use. May not be null. 
Save(XmlWriter)
Saves the HTML document to the specified XmlWriter.
public void Save(XmlWriter writer)Parameters
- writerXmlWriter
- The XmlWriter to which you want to save.