Tag Archives: Java parsing xml

Java parsing xml file encounters special symbols & will be abnormal solutions

Text/Zhu Jiqian

In the development process of a Java parsing xml file, when using SAX parsing, such an exception message appeared:

Error on line 60 of document : References to the entity "xxx" must end with a ';' separator;

After I opened the xml file, I found that the “xxx” symbol was followed by an “&” symbol. Later, I learned that this type of symbol is a special symbol in xml, and if the special symbol is not represented by an escape character, it is used directly In the xml file, there will be strange exceptions when using SAX and other methods for parsing.

In fact, this is all caused by these special characters.

Special symbols in XML include <> & ‘”, etc. They are not allowed as PCDATA in xml files. If you want to use them, you need to use escape characters instead:

&lt;    <
&gt;    >
&amp;   &
&quot;  "
&apos;  '

So, if you want to read the xml file data normally, how should you use the escape character to replace it?

At the beginning, I was thinking about how to solve Baidu, but found that many posts were several years ago, and there was no clear way to solve it. Most of them mentioned that the analysis abnormality caused by special symbols, but how to filter it out, it seems It’s vague, so I can only make a fool of myself and come up with a more appropriate solution for filtering special characters.

The realization idea is actually very simple. We can read the xml file through the Reader before reading the xml file and use SAX to parse it, and then read it out by line and concatenate it into a String string, and then use the string replacement method replaceAll() After replacing the special symbols, you can directly convert the xml in the form of a string into a Document object for xml parsing:

  String xmlStr=s.replaceAll("&","&amp;");

The conversion method code is as follows:

  StringBuffer buffer = new StringBuffer();
  BufferedReader bf= new BufferedReader(new FileReader("D:\\测试.xml"));
  String s = null;
     while((s = bf.readLine())!=null){
     buffer.append(s.trim());
  }

  String str = buffer.toString();
  //In this step character replacement is performed, replacing them with legal escaped characters
  String xml=str.replaceAll("&","&amp;");

  //Here the processed xml file can be read and parsed
  Document document =  DocumentHelper.parseText(xml);

So far, you can solve the problem of special symbols & abnormalities encountered in Java parsing xml files.