Class HTMLParser
java.lang.Object
com.webmethods.rtl.markup.xml.parser.BaseParser
com.webmethods.rtl.markup.html.parser.HTMLParser
- All Implemented Interfaces:
XMLReader
SAX parser that parses non-well-formed html.
Only one thread at a time may use this parser.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
Empty element.static final int
Element that doesn't have to follow any nesting rules or even be well-formed.static final int
Inline element.static final int
List item closed by any other li element.static final int
Block element closed by an element of the same type.static final int
Block element that can be nested.static final int
Block element closed by any other block element.static final int
Block element that doesn't contain markup.static final int
Table element (other than table) closed by any other table element.static final int
Element can be either inline or block or whatever.static Map
static Map
protected AttributesImpl
protected char[]
protected List
Fields inherited from class com.webmethods.rtl.markup.xml.parser.BaseParser
ACCEPT_CHARSET, CONTENT_CHARSET, DEFAULT_CHARSET, m_contentHandler, m_dtdHandler, m_entityResolver, m_errorHandler, m_features, m_lexicalHandler, m_properties, m_recognizedFeatures, m_recognizedProperties, PROPERTY_LEXICAL_HANDLER
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected int
append
(char[] buf, int off, int ch) Appends the specified char to the specified buf at the specified offset.protected int
Appends the specified string to the specified buf at the specified offset.protected int
appendComment
(char[] buf, int off, int ch) Appends the specified char to the specified buf at the specified offset.protected int
appendEntityRef
(char[] buf, int off, String ref) Appends the specified string to the specified buf at the specified offset -- or reports the entity ref.protected void
closeOptionalElements
(List elemStack, String curElem) protected int
flush
(char[] buf, int off) Flushes chars to content handler.protected boolean
isEmptyElement
(String curElem) protected boolean
isFormElement
(String curElem) protected boolean
isScriptElement
(String curElem) void
parse
(InputSource input) Parse an XML document.protected int
parseCharRef
(Reader src, char[] buf, int off) Decodes char ref starting after &# (ie '169;' or 'xa0;').Methods inherited from class com.webmethods.rtl.markup.xml.parser.BaseParser
getCharacterStream, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getProperty, parse, setContentHandler, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setProperty
-
Field Details
-
ELEMENT_TYPE_WHATEVER
public static final int ELEMENT_TYPE_WHATEVERElement can be either inline or block or whatever.- See Also:
-
ELEMENT_TYPE_EMPTY
public static final int ELEMENT_TYPE_EMPTYEmpty element. For example, br.- See Also:
-
ELEMENT_TYPE_INLINE
public static final int ELEMENT_TYPE_INLINEInline element. For example, span.- See Also:
-
ELEMENT_TYPE_P
public static final int ELEMENT_TYPE_PBlock element closed by any other block element. For example, p.- See Also:
-
ELEMENT_TYPE_TABLE
public static final int ELEMENT_TYPE_TABLETable element (other than table) closed by any other table element. For example, thead, tr, or td.- See Also:
-
ELEMENT_TYPE_LI
public static final int ELEMENT_TYPE_LIList item closed by any other li element. For example, li, dt, or dd.- See Also:
-
ELEMENT_TYPE_LIST
public static final int ELEMENT_TYPE_LISTBlock element closed by an element of the same type. For example, address.- See Also:
-
ELEMENT_TYPE_NESTED
public static final int ELEMENT_TYPE_NESTEDBlock element that can be nested. For example, div.- See Also:
-
ELEMENT_TYPE_SCRIPT
public static final int ELEMENT_TYPE_SCRIPTBlock element that doesn't contain markup. For example, script or style.- See Also:
-
ELEMENT_TYPE_FORM
public static final int ELEMENT_TYPE_FORMElement that doesn't have to follow any nesting rules or even be well-formed.- See Also:
-
m_buf
protected char[] m_buf -
m_elemStack
-
m_attrs
-
HTMLElementTypes
-
HTMLEntities
-
-
Constructor Details
-
HTMLParser
public HTMLParser()
-
-
Method Details
-
parse
Parse an XML document.The application can use this method to instruct the XML reader to begin parsing an XML document from any valid input source (a character stream, a byte stream, or a URI).
Applications may not invoke this method while a parse is in progress (they should create a new XMLReader instead for each nested XML document). Once a parse is complete, an application may reuse the same XMLReader object, possibly with a different input source.
During the parse, the XMLReader will provide information about the XML document through the registered event handlers.
This method is synchronous: it will not return until parsing has ended. If a client application wants to terminate parsing early, it should throw an exception.
- Parameters:
input
- The input source for the top-level of the XML document.- Throws:
SAXException
- Any SAX exception, possibly wrapping another exception.IOException
- An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.- See Also:
-
flush
Flushes chars to content handler. Returns the new offset (ie 0).- Throws:
IOException
SAXException
-
appendComment
Appends the specified char to the specified buf at the specified offset. Returns the new offset. If the buf is full, flushes buf to configured content handler.- Throws:
IOException
SAXException
-
append
Appends the specified char to the specified buf at the specified offset. Returns the new offset. If the buf is full, flushes buf to configured content handler.- Throws:
IOException
SAXException
-
append
Appends the specified string to the specified buf at the specified offset. Returns the new offset. If the buf is full, flushes buf to configured content handler.- Throws:
IOException
SAXException
-
appendEntityRef
Appends the specified string to the specified buf at the specified offset -- or reports the entity ref. Returns the new offset. If the buf is full, flushes buf to configured content handler.- Throws:
IOException
SAXException
-
parseCharRef
Decodes char ref starting after &# (ie '169;' or 'xa0;'). Returns the new offset.- Throws:
IOException
SAXException
-
isEmptyElement
- Throws:
IOException
SAXException
-
isScriptElement
- Throws:
IOException
SAXException
-
isFormElement
- Throws:
IOException
SAXException
-
closeOptionalElements
protected void closeOptionalElements(List elemStack, String curElem) throws IOException, SAXException - Throws:
IOException
SAXException
-