Class XmlParser

java.lang.Object
com.pnfsoftware.jeb.util.encoding.xml.XmlParser

public class XmlParser extends Object
A limited, simple, lenient, fast, and read-only XML parser. It can be used as a back-up when the JDK implementation (typically, Apache Xerces) fails on documents deviating from the XML specifications.

Features and limitations:

  • XML version must be 1.x, encoding must be UTF-8.
  • All unicode chars are accepted as long as there is no parsing ambiguity.
  • Multiple root elements are allowed.
  • Android-style backslash escapes in attribute values can be supported ( details).
  • This parser returns XML documents implementing the read-only parts of the standard org.w3c.dom API (refer to the X... classes in this package).
  • The following XML node types are NOT supported: DocumentFragment, Entity, EntityReference, Notation, ProcessingInstruction.
  • Calls to unsupported methods will raise UnsupportedOperationException.
  • Allow unclosed tags (like the famous html <br>)
  • Allow unquoted attribute values
  • Constructor Details

    • XmlParser

      public XmlParser()
  • Method Details

    • setAssignParentNodes

      public void setAssignParentNodes(boolean assignParentNodes)
      Parameters:
      assignParentNodes - if true, the internal parent and/or owner fields are set, allowing the use of methods like Node.getParentNode(), Node.getOwnerDocument(), Attr.getOwnerElement(), etc.
    • isAssignParentNodes

      public boolean isAssignParentNodes()
      Returns:
      the default is false
    • setSortAttributes

      public void setSortAttributes(boolean sortAttributes)
      Parameters:
      sortAttributes - if true, the element attributes are sorted by name, alphabetically; if false, the original order is maintained (note: Xerces does alphasort)
    • isSortAttributes

      public boolean isSortAttributes()
      Returns:
      the default is false
    • setHandleBackslashAxmlStyle

      public void setHandleBackslashAxmlStyle(boolean handleBackslashAxmlStyle)
      Parameters:
      handleBackslashAxmlStyle - If true, \n and \t escapes are allowed in attribute values. Other escapes (\x) will result in the character "x". If false, The normal XML behavior applies: \ is a regular character, meaning \x is literally "\x".
    • isHandleBackslashAxmlStyle

      public boolean isHandleBackslashAxmlStyle()
      Returns:
      the default is false
    • setAllowUnclosedTags

      public void setAllowUnclosedTags(boolean allowUnclosedTags)
      Parameters:
      allowUnclosedTags - if true, the parser will consider unclosed tags as part of XText
    • isAllowUnclosedTags

      public boolean isAllowUnclosedTags()
      Returns:
      the default is false
    • setAllowMismatchedTags

      public void setAllowMismatchedTags(boolean allowMismatchedTags)
      Parameters:
      allowMismatchedTags - if true, the parser will match the tags case-insensitively
    • isAllowMismatchedTags

      public boolean isAllowMismatchedTags()
      Returns:
      the default is false
    • isAllowNoXmlDeclaration

      public boolean isAllowNoXmlDeclaration()
      Returns:
      the default is false
    • setAllowNoXmlDeclaration

      public void setAllowNoXmlDeclaration(boolean allowNoXmlDeclaration)
      Parameters:
      allowNoXmlDeclaration - if true, the parser will process even if no xml declaration is found (<?xml...)
    • parse

      public XDocument parse(String str) throws ParseException
      Parse the provided XML string and return an XML Document object. This method will throw if an error occurs.
      Parameters:
      str - XML string (the encoding attribute is disregarded)
      Returns:
      a document object, never null (the method throws on error)
      Throws:
      ParseException
    • parse

      public XDocument parse(byte[] bytes) throws ParseException
      Parse the provided XML data and return an XML Document object. This method will throw if an error occurs.
      Parameters:
      str - XML data
      Returns:
      a document object, never null (the method throws on error)
      Throws:
      ParseException