public class ScriptSafeHTMLSerializer extends HTMLSerializer
XMLSerializer.PrefixMapping
Modifier and Type | Field and Description |
---|---|
static Set |
DANGEROUS_ATTRS
uri attributes
|
protected AttributesImpl |
m_atts |
protected Set |
m_dangerousAttrs |
protected Set |
m_safeElements |
protected Set |
m_skipElements |
protected int |
m_unsafe |
protected Set |
m_unsafeElements |
static Pattern |
RE_HTML_NAME
(html 4.01 6.2) ID and NAME tokens must begin with a letter ([A-Za-z])
and may be followed by any number of letters, digits ([0-9]),
hyphens ("-"), underscores ("_"), colons (":"), and periods (".").
|
static Pattern |
RE_SAFE_URL_SCHEME |
static Pattern |
RE_UNSAFE_CSS_CONTENT |
static Set |
SAFE_ELEMENTS
whitelist
|
static Set |
SKIP_ELEMENTS
skip tag, process content
|
static Set |
UNSAFE_ELEMENTS
blacklist
|
m_entity, m_script, NOT_EMPTY_ELEMENTS, SCRIPT_ELEMENTS
m_cdata, m_documentLocator, m_emptyElement, m_newPrefixesSize, m_out, m_prefixes, m_prefixesSize
Constructor and Description |
---|
ScriptSafeHTMLSerializer() |
ScriptSafeHTMLSerializer(Writer out) |
Modifier and Type | Method and Description |
---|---|
void |
characters(char[] ch,
int start,
int length)
Receive notification of character data.
|
void |
comment(char[] ch,
int start,
int length)
Report an XML comment anywhere in the document.
|
void |
endCDATA()
Report the end of a CDATA section.
|
void |
endElement(String uri,
String localName,
String qName)
Receive notification of the end of an element.
|
void |
endEntity(String name)
Report the end of an entity.
|
static boolean |
equalsIgnoreCaseAscii(String s1,
String s2) |
Set |
getSafeElements()
Uses safe elements (whitelist) if unsafe elements (blacklist) not set or empty.
|
Set |
getSkipElements()
Ignores elements, processes content.
|
Set |
getUnsafeElements()
Uses safe elements (whitelist) if unsafe elements (blacklist) not set or empty.
|
protected void |
initConfiguration() |
static boolean |
isUnsafeElement(String qname,
Set unsafeElements,
Set safeElements) |
static String |
processHTML(String html)
Processes tag-soup html, stripping JavaScript blocks and event handlers
(and other dangerous tags, like APPLET or IFRAME).
|
void |
processingInstruction(String target,
String data)
Receive notification of a processing instruction.
|
void |
setSafeElements(Set safeElements)
Uses safe elements (whitelist) if unsafe elements (blacklist) not set or empty.
|
void |
setSkipElements(Set skipElements)
Ignores elements, processes content.
|
void |
setUnsafeElements(Set unsafeElements)
Uses safe elements (whitelist) if unsafe elements (blacklist) not set or empty.
|
void |
skippedEntity(String name)
Receive notification of a skipped entity.
|
void |
startCDATA()
Report the start of a CDATA section.
|
void |
startDocument()
Receive notification of the beginning of a document.
|
void |
startElement(String uri,
String localName,
String qName,
Attributes atts)
Receive notification of the beginning of an element.
|
void |
startEntity(String name)
Report the beginning of some internal and external XML entities.
|
static boolean |
startsWithIgnoreCaseAscii(String s1,
String s2) |
static String |
toLowerCaseAscii(String s) |
escapeAttrValue
closeStartElement, endDocument, endDTD, endPrefixMapping, getDocumentLocator, getOut, ignorableWhitespace, lookupPrefix, setDocumentLocator, setOut, startDTD, startPrefixMapping
error, fatalError, notationDecl, resolveEntity, unparsedEntityDecl, warning
protected int m_unsafe
protected AttributesImpl m_atts
protected Set m_safeElements
protected Set m_unsafeElements
protected Set m_skipElements
protected Set m_dangerousAttrs
public static Set SAFE_ELEMENTS
public static Set UNSAFE_ELEMENTS
public static Set SKIP_ELEMENTS
public static Set DANGEROUS_ATTRS
public static final Pattern RE_HTML_NAME
public static final Pattern RE_SAFE_URL_SCHEME
public static final Pattern RE_UNSAFE_CSS_CONTENT
public ScriptSafeHTMLSerializer()
public ScriptSafeHTMLSerializer(Writer out)
public Set getSafeElements()
public void setSafeElements(Set safeElements)
public Set getUnsafeElements()
public void setUnsafeElements(Set unsafeElements)
public Set getSkipElements()
public void setSkipElements(Set skipElements)
public static String processHTML(String html) throws SAXException, IOException
html
- HTML to process.SAXException
IOException
public void startDocument() throws SAXException
The SAX parser will invoke this method only once, before any
other event callbacks (except for setDocumentLocator
).
startDocument
in interface ContentHandler
startDocument
in class HTMLSerializer
SAXException
- Any SAX exception, possibly
wrapping another exception.XMLSerializer.endDocument()
public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException
The Parser will invoke this method at the beginning of every
element in the XML document; there will be a corresponding
endElement
event for every startElement event
(even when the element is empty). All of the element's content will be
reported, in order, before the corresponding endElement
event.
This event allows up to three name components for each element:
Any or all of these may be provided, depending on the values of the http://xml.org/sax/features/namespaces and the http://xml.org/sax/features/namespace-prefixes properties:
Note that the attribute list provided will contain only
attributes with explicit values (specified or defaulted):
#IMPLIED attributes will be omitted. The attribute list
will contain attributes used for Namespace declarations
(xmlns* attributes) only if the
http://xml.org/sax/features/namespace-prefixes
property is true (it is false by default, and support for a
true value is optional).
Like characters()
, attribute values may have
characters that need more than one char
value.
startElement
in interface ContentHandler
startElement
in class HTMLSerializer
uri
- The Namespace URI, or the empty string if the
element has no Namespace URI or if Namespace
processing is not being performed.localName
- The local name (without prefix), or the
empty string if Namespace processing is not being
performed.qName
- The qualified name (with prefix), or the
empty string if qualified names are not available.atts
- The attributes attached to the element. If
there are no attributes, it shall be an empty
Attributes object.SAXException
- Any SAX exception, possibly
wrapping another exception.endElement(java.lang.String, java.lang.String, java.lang.String)
,
Attributes
public void endElement(String uri, String localName, String qName) throws SAXException
The SAX parser will invoke this method at the end of every
element in the XML document; there will be a corresponding
startElement
event for every endElement
event (even when the element is empty).
For information on the names, see startElement.
endElement
in interface ContentHandler
endElement
in class HTMLSerializer
uri
- The Namespace URI, or the empty string if the
element has no Namespace URI or if Namespace
processing is not being performed.localName
- The local name (without prefix), or the
empty string if Namespace processing is not being
performed.qName
- The qualified XML 1.0 name (with prefix), or the
empty string if qualified names are not available.SAXException
- Any SAX exception, possibly
wrapping another exception.public void characters(char[] ch, int start, int length) throws SAXException
The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.
The application must not attempt to read from the array outside of the specified range.
Individual characters may consist of more than one Java
char
value. There are two important cases where this
happens, because characters can't be represented in just sixteen bits.
In one case, characters are represented in a Surrogate Pair,
using two special Unicode values. Such characters are in the so-called
"Astral Planes", with a code point above U+FFFF. A second case involves
composite characters, such as a base character combining with one or
more accent characters.
Your code should not assume that algorithms using
char
-at-a-time idioms will be working in character
units; in some cases they will split characters. This is relevant
wherever XML permits arbitrary characters, such as attribute values,
processing instruction data, and comments as well as in data reported
from this method. It's also generally relevant whenever Java code
manipulates internationalized text; the issue isn't unique to XML.
Note that some parsers will report whitespace in element
content using the ignorableWhitespace
method rather than this one (validating parsers must
do so).
characters
in interface ContentHandler
characters
in class HTMLSerializer
ch
- The characters from the XML document.start
- The start position in the array.length
- The number of characters to read from the array.SAXException
- Any SAX exception, possibly
wrapping another exception.XMLSerializer.ignorableWhitespace(char[], int, int)
,
Locator
public void processingInstruction(String target, String data) throws SAXException
The Parser will invoke this method once for each processing instruction found: note that processing instructions may occur before or after the main document element.
A SAX parser must never report an XML declaration (XML 1.0, section 2.8) or a text declaration (XML 1.0, section 4.3.1) using this method.
Like characters()
, processing instruction
data may have characters that need more than one char
value.
processingInstruction
in interface ContentHandler
processingInstruction
in class XMLSerializer
target
- The processing instruction target.data
- The processing instruction data, or null if
none was supplied. The data does not include any
whitespace separating it from the target.SAXException
- Any SAX exception, possibly
wrapping another exception.public void skippedEntity(String name) throws SAXException
The Parser will invoke this method each time the entity is
skipped. Non-validating processors may skip entities if they
have not seen the declarations (because, for example, the
entity was declared in an external DTD subset). All processors
may skip external entities, depending on the values of the
http://xml.org/sax/features/external-general-entities
and the
http://xml.org/sax/features/external-parameter-entities
properties.
skippedEntity
in interface ContentHandler
skippedEntity
in class HTMLSerializer
name
- The name of the skipped entity. If it is a
parameter entity, the name will begin with '%', and if
it is the external DTD subset, it will be the string
"[dtd]".SAXException
- Any SAX exception, possibly
wrapping another exception.public void startEntity(String name) throws SAXException
The reporting of parameter entities (including
the external DTD subset) is optional, and SAX2 drivers that
report LexicalHandler events may not implement it; you can use the
http://xml.org/sax/features/lexical-handler/parameter-entities
feature to query or control the reporting of parameter entities.
General entities are reported with their regular names, parameter entities have '%' prepended to their names, and the external DTD subset has the pseudo-entity name "[dtd]".
When a SAX2 driver is providing these events, all other
events must be properly nested within start/end entity
events. There is no additional requirement that events from
DeclHandler
or
DTDHandler
be properly ordered.
Note that skipped entities will be reported through the
skippedEntity
event, which is part of the ContentHandler interface.
Because of the streaming event model that SAX uses, some entity boundaries cannot be reported under any circumstances:
These will be silently expanded, with no indication of where the original entity boundaries were.
Note also that the boundaries of character references (which are not really entities anyway) are not reported.
All start/endEntity events must be properly nested.
startEntity
in interface LexicalHandler
startEntity
in class HTMLSerializer
name
- The name of the entity. If it is a parameter
entity, the name will begin with '%', and if it is the
external DTD subset, it will be "[dtd]".SAXException
- The application may raise an exception.endEntity(java.lang.String)
,
DeclHandler.internalEntityDecl(java.lang.String, java.lang.String)
,
DeclHandler.externalEntityDecl(java.lang.String, java.lang.String, java.lang.String)
public void endEntity(String name) throws SAXException
endEntity
in interface LexicalHandler
endEntity
in class HTMLSerializer
name
- The name of the entity that is ending.SAXException
- The application may raise an exception.startEntity(java.lang.String)
public void startCDATA() throws SAXException
The contents of the CDATA section will be reported through
the regular characters
event; this event is intended only to report
the boundary.
startCDATA
in interface LexicalHandler
startCDATA
in class HTMLSerializer
SAXException
- The application may raise an exception.endCDATA()
public void endCDATA() throws SAXException
endCDATA
in interface LexicalHandler
endCDATA
in class HTMLSerializer
SAXException
- The application may raise an exception.startCDATA()
public void comment(char[] ch, int start, int length) throws SAXException
This callback will be used for comments inside or outside the document element, including comments in the external DTD subset (if read). Comments in the DTD must be properly nested inside start/endDTD and start/endEntity events (if used).
comment
in interface LexicalHandler
comment
in class XMLSerializer
ch
- An array holding the characters in the comment.start
- The starting position in the array.length
- The number of characters to use from the array.SAXException
- The application may raise an exception.protected void initConfiguration()
public static boolean isUnsafeElement(String qname, Set unsafeElements, Set safeElements)
public static boolean startsWithIgnoreCaseAscii(String s1, String s2)