关于dom4j无法解析xmlns问题及生成非UTF-8字符集乱码问题的解决-白红宇

关于dom4j无法解析xmlns问题及生成非UTF-8字符集乱码问题的解决

阅读量：4047 次

发布时间：2019-05-25

本文共 6536 字，大约阅读时间需要 21 分钟。

dom4j 无法解析xml命名空间的问题近日得以解决，如果这个问题也正在困扰你，看看下文也许能给你一些启发

<?xml version="1.0" encoding="UTF-8"?><MyXML xmlns="http://www.ttt.com/ttt-TrdInfo-1-0" xmlns:x="http://www.ttt.com/ttt/metadata.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="res286.xsd"><Hdr> <ReqId>001</ReqId> <Tid>1002</Tid> <Cid>500</Cid> <user>cuishen</user> <Mname>supermarket</Mname> <pwd>543200210</pwd></Hdr><Car> <Flg>T</Flg> <Cod>ccc</Cod> <Door>kkk</Door> <mktId>b01</mktId> <Key> <KeyID>t01</KeyID> </Key></Car></MyXML>

解析代码

import java.io.File;import java.util.List;import java.util.Map;import java.util.HashMap;import org.dom4j.Document;import org.dom4j.Element;import org.dom4j.XPath;import org.dom4j.Attribute;import org.dom4j.io.SAXReader;import org.dom4j.DocumentException;public class ReadMyXML{ public static void main(String args[]){ File xmlFile = new File("c:/myXML.xml"); SAXReader xmlReader = new SAXReader(); try{ Document document = xmlReader.read(xmlFile); ///*测试代码适用于读取xml的节点 HashMap xmlMap = new HashMap(); xmlMap.put("mo","http://www.ttt.com/ttt-TrdInfo-1-0"); XPath x = document.createXPath("//mo:ReqId"); x.setNamespaceURIs(xmlMap); Element valueElement = (Element)x.selectSingleNode(document); System.out.println(valueElement.getText()); //*/ }catch(DocumentException e){ e.printStackTrace(); } }}

上面就是运用dom4j 解析带命名空间的xml文件的节点的例子，只要给XPath设置默认的命名空间就行了，这个xml文件尽管定义了其他命名空间，但是没有用到它，所以不必管它，那个HashMap里的键是随便定义的字符串，值就是默认的命名空间对应的字符串。document.createXPath()里传的参数是要读取的节点的XPath，即“//”+ HashMap里的键名 + “:”+ 要读取的节点名组成的字符串，简单吧，后面怎么做我就不用说了吧^_^

如果要读取的是xml文件里的属性该怎么办呢，不用急，看看下面的例子你就明白了，原理一样，只要在造XPath字符串的时候在属性前加个“@”就行了。

XML

<?xml version="1.0" encoding="UTF-8"?><MyXML xmlns="http://www.ttt.com/ttt-TrdInfo-1-0" xmlns:x="http://www.ttt.com/ttt/metadata.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="res286.xsd"><Hdr ReqId="001" Tid="1002" Cid="500" user="cuishen" Mname="supermarket" pwd="543200210"/><Car Flg="T" Cod="ccc" Door="kkk" mktId="b01"><Key KeyID="t01"/></Car></MyXML>

解析代码

import java.io.File;import java.util.List;import java.util.Map;import java.util.HashMap;import org.dom4j.Document;import org.dom4j.Element;import org.dom4j.XPath;import org.dom4j.Attribute;import org.dom4j.io.SAXReader;import org.dom4j.DocumentException;public class ReadMyXML2{ public static void main(String args[]){ File xmlFile = new File("c:/myXML2.xml"); SAXReader xmlReader = new SAXReader(); try{ Document document = xmlReader.read(xmlFile); ///*测试代码解析xml的属性 HashMap xmlMap = new HashMap(); xmlMap.put("mo","http://www.ttt.com/ttt-TrdInfo-1-0"); XPath x = document.createXPath("//mo:Hdr/@ReqId"); x.setNamespaceURIs(xmlMap); Attribute valueAttribute = (Attribute)x.selectSingleNode(document); System.out.println(valueAttribute.getText()); //*/ }catch(DocumentException e){ e.printStackTrace(); } }}

使用DOM4J的XMLWriter输出UTF-8编码的XML文件时，出现乱码。

首先，设置输出的编码，在这我们使用UTF-8

OutputFormat format = OutputFormat.createPrettyPrint(); format.setEncoding("utf-8");

输出代码

try { output = new XMLWriter(new FileWriter("entity.xml"), format); output.write(document); output.close(); } catch (IOException e) { e.printStackTrace(); }

上面的输出如果有中文，可以会出现乱码的问题，将上面的FileWriter改成FileOutputStream便可以了。

try { output = new XMLWriter(new FileOutputStream("entity.xml"), format); output.write(document); output.close(); } catch (IOException e) { e.printStackTrace(); }

另附一篇编码解决方法

这几天开始学习dom4j，在网上找了篇文章就开干了，上手非常的快，但是发现了个问题就是无法以UTF-8保存xml文件，保存后再次读出的时候会报 “Invalid byte 2 of 2-byte UTF-8 sequence.”这样一个错误，检查发现由dom4j生成的这个文件，在使用可正确处理XML编码的任何的编辑器中中文成乱码，从记事本查看并不会出现乱码会正确显示中文。让我很是头痛。试着使用GBK、gb2312编码来生成的xml文件却可以正常的被解析。因此怀疑的dom4j没有对utf-8编码进行处理。便开始查看dom4j的原代码。终于发现的问题所在，是自己程序的问题。

　　在dom4j的范例和网上流行的《DOM4J 使用简介》这篇教程中新建一个xml文档的代码都类似如下

　　 public void createXML(String fileName) {

　　 document．nbspdoc = org.dom4j.document．elper.createdocument．);

　　 Element root = doc.addElement("book");

　　 root.addAttribute("name", "我的图书");

　　 Element childTmp;

　　 childTmp = root.addElement("price");

　　 childTmp.setText("21.22");

　　 Element writer = root.addElement("author");

　　 writer.setText("李四");

　　 writer.addAttribute("ID", "001");

　　 try {

　　 org.dom4j.io.XMLWriter xmlWriter = new org.dom4j.io.XMLWriter(

　　 new FileWriter(fileName));

　　 xmlWriter.write(doc);

　　 xmlWriter.close();

　　 }

　　 catch (Exception e) {

　　 System.out.println(e);

　　 }

　　在上面的代码中输出使用的是FileWriter对象进行文件的输出。这就是不能正确进行文件编码的原因所在，java中由Writer类继承下来的子类没有提供编码格式处理，所以dom4j也就无法对输出的文件进行正确的格式处理。这时候所保存的文件会以系统的默认编码对文件进行保存，在中文版的 window下java的默认的编码为GBK，也就是所虽然我们标识了要将xml保存为utf-8格式但实际上文件是以GBK格式来保存的，所以这也就是为什么能够我们使用GBK、GB2312编码来生成xml文件能正确的被解析，而以UTF-8格式生成的文件不能被xml解析器所解析的原因。

　　好了现在我们找到了原因所在了，我们来找解决办法吧。首先我们看看dom4j是如何实现编码处理的

　　 public XMLWriter(OutputStream out) throws UnsupportedEncodingException {

　　 //System.out.println("In OutputStream");

　　 this.format = DEFAULT_FORMAT;

　　 this.writer = createWriter(out, format.getEncoding());

　　 this.autoFlush = true;

　　 namespaceStack.push(Namespace.NO_NAMESPACE);

　　 }

　　 public XMLWriter(OutputStream out, OutputFormat format) throws UnsupportedEncodingException {

　　 //System.out.println("In OutputStream,OutputFormat");

　　 this.format = format;