Class HtmlDocument
- Namespace
- HtmlAgilityPack
- Assembly
- HtmlAgilityPack.dll
Represents a complete HTML document.
public class HtmlDocument : IXPathNavigable
- Inheritance
-
HtmlDocument
- Implements
-
IXPathNavigable
- Inherited Members
Constructors
HtmlDocument()
Creates an instance of an HTML document.
public HtmlDocument()
Fields
OptionAddDebuggingAttributes
Adds Debugging attributes to node. Default is false.
public bool OptionAddDebuggingAttributes
Field Value
OptionAutoCloseOnEnd
Defines if closing for non closed nodes must be done at the end or directly in the document. Setting this to true can actually change how browsers render the page. Default is false.
public bool OptionAutoCloseOnEnd
Field Value
OptionCheckSyntax
Defines if non closed nodes will be checked at the end of parsing. Default is true.
public bool OptionCheckSyntax
Field Value
OptionComputeChecksum
Defines if a checksum must be computed for the document while parsing. Default is false.
public bool OptionComputeChecksum
Field Value
OptionDefaultStreamEncoding
Defines the default stream encoding to use. Default is System.Text.Encoding.Default.
public Encoding OptionDefaultStreamEncoding
Field Value
OptionExtractErrorSourceText
Defines if source text must be extracted while parsing errors. If the document has a lot of errors, or cascading errors, parsing performance can be dramatically affected if set to true. Default is false.
public bool OptionExtractErrorSourceText
Field Value
OptionExtractErrorSourceTextMaxLength
Defines the maximum length of source text or parse errors. Default is 100.
public int OptionExtractErrorSourceTextMaxLength
Field Value
OptionFixNestedTags
Defines if LI, TR, TH, TD tags must be partially fixed when nesting errors are detected. Default is false.
public bool OptionFixNestedTags
Field Value
OptionOutputAsXml
Defines if output must conform to XML, instead of HTML.
public bool OptionOutputAsXml
Field Value
OptionOutputOptimizeAttributeValues
Defines if attribute value output must be optimized (not bound with double quotes if it is possible). Default is false.
public bool OptionOutputOptimizeAttributeValues
Field Value
OptionOutputOriginalCase
Defines if name must be output with it's original case. Useful for asp.net tags and attributes
public bool OptionOutputOriginalCase
Field Value
OptionOutputUpperCase
Defines if name must be output in uppercase. Default is false.
public bool OptionOutputUpperCase
Field Value
OptionReadEncoding
Defines if declared encoding must be read from the document. Declared encoding is determined using the meta http-equiv="content-type" content="text/html;charset=XXXXX" html node. Default is true.
public bool OptionReadEncoding
Field Value
OptionStopperNodeName
Defines the name of a node that will throw the StopperNodeException when found as an end node. Default is null.
public string OptionStopperNodeName
Field Value
OptionUseIdAttribute
Defines if the 'id' attribute must be specifically used. Default is true.
public bool OptionUseIdAttribute
Field Value
OptionWriteEmptyNodes
Defines if empty nodes must be written as closed during output. Default is false.
public bool OptionWriteEmptyNodes
Field Value
Properties
CheckSum
Gets the document CRC32 checksum if OptionComputeChecksum was set to true before parsing, 0 otherwise.
public int CheckSum { get; }
Property Value
DeclaredEncoding
Gets the document's declared encoding. Declared encoding is determined using the meta http-equiv="content-type" content="text/html;charset=XXXXX" html node.
public Encoding DeclaredEncoding { get; }
Property Value
DocumentNode
Gets the root node of the document.
public HtmlNode DocumentNode { get; }
Property Value
Encoding
Gets the document's output encoding.
public Encoding Encoding { get; }
Property Value
ParseErrors
Gets a list of parse errors found in the document.
public IEnumerable<HtmlParseError> ParseErrors { get; }
Property Value
Remainder
Gets the remaining text. Will always be null if OptionStopperNodeName is null.
public string Remainder { get; }
Property Value
RemainderOffset
Gets the offset of Remainder in the original Html text. If OptionStopperNodeName is null, this will return the length of the original Html text.
public int RemainderOffset { get; }
Property Value
StreamEncoding
Gets the document's stream encoding.
public Encoding StreamEncoding { get; }
Property Value
Methods
CreateAttribute(string)
Creates an HTML attribute with the specified name.
public HtmlAttribute CreateAttribute(string name)
Parameters
name
stringThe name of the attribute. May not be null.
Returns
- HtmlAttribute
The new HTML attribute.
CreateAttribute(string, string)
Creates an HTML attribute with the specified name.
public HtmlAttribute CreateAttribute(string name, string value)
Parameters
Returns
- HtmlAttribute
The new HTML attribute.
CreateComment()
Creates an HTML comment node.
public HtmlCommentNode CreateComment()
Returns
- HtmlCommentNode
The new HTML comment node.
CreateComment(string)
Creates an HTML comment node with the specified comment text.
public HtmlCommentNode CreateComment(string comment)
Parameters
comment
stringThe comment text. May not be null.
Returns
- HtmlCommentNode
The new HTML comment node.
CreateElement(string)
Creates an HTML element node with the specified name.
public HtmlNode CreateElement(string name)
Parameters
name
stringThe qualified name of the element. May not be null.
Returns
- HtmlNode
The new HTML node.
CreateNavigator()
Creates a new XPathNavigator object for navigating this HTML document.
public XPathNavigator CreateNavigator()
Returns
- XPathNavigator
An XPathNavigator object. The XPathNavigator is positioned on the root of the document.
CreateTextNode()
Creates an HTML text node.
public HtmlTextNode CreateTextNode()
Returns
- HtmlTextNode
The new HTML text node.
CreateTextNode(string)
Creates an HTML text node with the specified text.
public HtmlTextNode CreateTextNode(string text)
Parameters
text
stringThe text of the node. May not be null.
Returns
- HtmlTextNode
The new HTML text node.
DetectEncoding(Stream)
Detects the encoding of an HTML stream.
public Encoding DetectEncoding(Stream stream)
Parameters
stream
StreamThe input stream. May not be null.
Returns
- Encoding
The detected encoding.
DetectEncoding(TextReader)
Detects the encoding of an HTML text provided on a TextReader.
public Encoding DetectEncoding(TextReader reader)
Parameters
reader
TextReaderThe TextReader used to feed the HTML. May not be null.
Returns
- Encoding
The detected encoding.
DetectEncoding(string)
Detects the encoding of an HTML file.
public Encoding DetectEncoding(string path)
Parameters
path
stringPath for the file containing the HTML document to detect. May not be null.
Returns
- Encoding
The detected encoding.
DetectEncodingAndLoad(string)
Detects the encoding of an HTML document from a file first, and then loads the file.
public void DetectEncodingAndLoad(string path)
Parameters
path
stringThe complete file path to be read.
DetectEncodingAndLoad(string, bool)
Detects the encoding of an HTML document from a file first, and then loads the file.
public void DetectEncodingAndLoad(string path, bool detectEncoding)
Parameters
path
stringThe complete file path to be read. May not be null.
detectEncoding
booltrue to detect encoding, false otherwise.
DetectEncodingHtml(string)
Detects the encoding of an HTML text.
public Encoding DetectEncodingHtml(string html)
Parameters
html
stringThe input html text. May not be null.
Returns
- Encoding
The detected encoding.
GetElementbyId(string)
Gets the HTML node with the specified 'id' attribute value.
public HtmlNode GetElementbyId(string id)
Parameters
id
stringThe attribute id to match. May not be null.
Returns
- HtmlNode
The HTML node with the matching id or null if not found.
GetXmlName(string)
Gets a valid XML name.
public static string GetXmlName(string name)
Parameters
name
stringAny text.
Returns
- string
A string that is a valid XML name.
HtmlEncode(string)
Applies HTML encoding to a specified string.
public static string HtmlEncode(string html)
Parameters
html
stringThe input string to encode. May not be null.
Returns
- string
The encoded string.
IsWhiteSpace(int)
Determines if the specified character is considered as a whitespace character.
public static bool IsWhiteSpace(int c)
Parameters
c
intThe character to check.
Returns
- bool
true if if the specified character is considered as a whitespace character.
Load(Stream)
Loads an HTML document from a stream.
public void Load(Stream stream)
Parameters
stream
StreamThe input stream.
Load(Stream, bool)
Loads an HTML document from a stream.
public void Load(Stream stream, bool detectEncodingFromByteOrderMarks)
Parameters
stream
StreamThe input stream.
detectEncodingFromByteOrderMarks
boolIndicates whether to look for byte order marks at the beginning of the stream.
Load(Stream, Encoding)
Loads an HTML document from a stream.
public void Load(Stream stream, Encoding encoding)
Parameters
Load(Stream, Encoding, bool)
Loads an HTML document from a stream.
public void Load(Stream stream, Encoding encoding, bool detectEncodingFromByteOrderMarks)
Parameters
stream
StreamThe input stream.
encoding
EncodingThe character encoding to use.
detectEncodingFromByteOrderMarks
boolIndicates whether to look for byte order marks at the beginning of the stream.
Load(Stream, Encoding, bool, int)
Loads an HTML document from a stream.
public void Load(Stream stream, Encoding encoding, bool detectEncodingFromByteOrderMarks, int buffersize)
Parameters
stream
StreamThe input stream.
encoding
EncodingThe character encoding to use.
detectEncodingFromByteOrderMarks
boolIndicates whether to look for byte order marks at the beginning of the stream.
buffersize
intThe minimum buffer size.
Load(TextReader)
Loads the HTML document from the specified TextReader.
public void Load(TextReader reader)
Parameters
reader
TextReaderThe TextReader used to feed the HTML data into the document. May not be null.
Load(string)
Loads an HTML document from a file.
public void Load(string path)
Parameters
path
stringThe complete file path to be read. May not be null.
Load(string, bool)
Loads an HTML document from a file.
public void Load(string path, bool detectEncodingFromByteOrderMarks)
Parameters
path
stringThe complete file path to be read. May not be null.
detectEncodingFromByteOrderMarks
boolIndicates whether to look for byte order marks at the beginning of the file.
Load(string, Encoding)
Loads an HTML document from a file.
public void Load(string path, Encoding encoding)
Parameters
path
stringThe complete file path to be read. May not be null.
encoding
EncodingThe character encoding to use. May not be null.
Load(string, Encoding, bool)
Loads an HTML document from a file.
public void Load(string path, Encoding encoding, bool detectEncodingFromByteOrderMarks)
Parameters
path
stringThe complete file path to be read. May not be null.
encoding
EncodingThe character encoding to use. May not be null.
detectEncodingFromByteOrderMarks
boolIndicates whether to look for byte order marks at the beginning of the file.
Load(string, Encoding, bool, int)
Loads an HTML document from a file.
public void Load(string path, Encoding encoding, bool detectEncodingFromByteOrderMarks, int buffersize)
Parameters
path
stringThe complete file path to be read. May not be null.
encoding
EncodingThe character encoding to use. May not be null.
detectEncodingFromByteOrderMarks
boolIndicates whether to look for byte order marks at the beginning of the file.
buffersize
intThe minimum buffer size.
LoadHtml(string)
Loads the HTML document from the specified string.
public void LoadHtml(string html)
Parameters
html
stringString containing the HTML document to load. May not be null.
Save(Stream)
Saves the HTML document to the specified stream.
public void Save(Stream outStream)
Parameters
outStream
StreamThe stream to which you want to save.
Save(Stream, Encoding)
Saves the HTML document to the specified stream.
public void Save(Stream outStream, Encoding encoding)
Parameters
outStream
StreamThe stream to which you want to save. May not be null.
encoding
EncodingThe character encoding to use. May not be null.
Save(StreamWriter)
Saves the HTML document to the specified StreamWriter.
public void Save(StreamWriter writer)
Parameters
writer
StreamWriterThe StreamWriter to which you want to save.
Save(TextWriter)
Saves the HTML document to the specified TextWriter.
public void Save(TextWriter writer)
Parameters
writer
TextWriterThe TextWriter to which you want to save. May not be null.
Save(string)
Saves the mixed document to the specified file.
public void Save(string filename)
Parameters
filename
stringThe location of the file where you want to save the document.
Save(string, Encoding)
Saves the mixed document to the specified file.
public void Save(string filename, Encoding encoding)
Parameters
filename
stringThe location of the file where you want to save the document. May not be null.
encoding
EncodingThe character encoding to use. May not be null.
Save(XmlWriter)
Saves the HTML document to the specified XmlWriter.
public void Save(XmlWriter writer)
Parameters
writer
XmlWriterThe XmlWriter to which you want to save.