java XML读取具有不同段的相同标记

1 年 Questions & Answers 154

下面是xml文件

<maindata> <publication-reference> <document-id document-id-type="docdb"> <country>US</country> <doc-number>9820394ASD</doc-number> <date>20111101</date> </document-id> <document-id document-id-type="docmain"> <doc-number>9820394</doc-number> <date>20111101</date> </document-id> </publication-reference> </maindata>

我想提取type=“docmain”下的<doc-number>标记值下面是我的java代码，在执行其提取时9829394ASD而不是9820394

public static void main(String[] args) { String filePath ="D:/bs.xml"; File xmlFile = new File(filePath); DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder dBuilder; try { dBuilder = dbFactory.newDocumentBuilder(); Document doc = dBuilder.parse(xmlFile); doc.getDocumentElement().normalize(); System.out.println("Root element :" + doc.getDocumentElement().getNodeName()); NodeList nodeList = doc.getElementsByTagName("publication-reference"); List<Biblio> docList = new ArrayList<Biblio>(); for (int i = 0; i < nodeList.getLength(); i++) { docList.add(getdoc(nodeList.item(i))); } } catch (SAXException | ParserConfigurationException | IOException e1) { e1.printStackTrace(); } } private static Biblio getdoc(Node node) { Biblio bib = new Biblio(); if (node.getNodeType() == Node.ELEMENT_NODE) { Element element = (Element) node; bib.setCountry(getTagValue("country",element)); bib.setDocnumber(getTagValue("doc-number",element)); bib.setDate(getTagValue("date",element)); } return bib; }

让我知道我们如何检查其docmain或doctype的类型，仅当类型为docmain时才应提取，否则应离开元素

添加了getTagValue方法

private static String getTagValue(String tag, Element element) { NodeList nodeList = element.getElementsByTagName(tag).item(0).getChildNodes(); Node node = (Node) nodeList.item(0); return node.getNodeValue(); }

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(new File(...) ); XPathFactory xPathfactory = XPathFactory.newInstance(); XPath xpath = xPathfactory.newXPath(); XPathExpression expr = xpath.compile("//document-id[@document-id-type=\"docmain\"]/doc-number/text()"); String value = expr.evaluate(doc);

# 3 楼答案

更改方法getdoc()，使其仅为'docmain'类型创建Biblio对象

private static Biblio getdoc(Node node) {
  Biblio bib = null;
  if (node.getNodeType() == Node.ELEMENT_NODE) {
    Element element = (Element) node;
    String type = element.getAttribute("document-id-type");
    if(type != null && type.equals("docmain")) {
      bib = new Biblio();
      bib.setCountry(getTagValue("country",element));
      bib.setDocnumber(getTagValue("doc-number",element));
      bib.setDate(getTagValue("date",element));          
    }
  }
  return bib;
}

然后，在main方法中，如果getdoc()结果不为空，则只应将其添加到列表中：

for (int i = 0; i < nodeList.getLength(); i++) {
  Biblio biblio = getdoc(nodeList.item(i));
  if(biblio != null) {
    docList.add(biblio);
  }
}

更新： 好吧，这太可怕了，抱歉。您应该真正了解一点XPath。我试着用XPath表达式重写它

首先，我们需要四个XPath表达式。一个是提取一个节点列表，其中包含类型为docmain的所有document-id元素

其XPath表达式是：/maindata/publication-reference/document-id[@document-id-type='docmain']（上下文中的整个XML文档）

这里[]中的谓词确保只提取类型为docmain的document-id元素

然后对于document-id元素中的每个字段（以document-id元素作为上下文）：

国家：country
文档编号：doc-number
日期：date

我们使用静态初始值设定项：

private static XPathExpression xpathDocId;
private static XPathExpression xpathCountry;
private static XPathExpression xpathDocnumber;
private static XPathExpression xpathDate;

static {
  try {
    XPath xpath = XPathFactory.newInstance().newXPath();
    // Context is the whole document. Find all document-id elements with type docmain
    xpathDocId = xpath.compile("/maindata/publication-reference/document-id[@document-id-type='docmain']");

    // Context is a document-id element. 
    xpathCountry = xpath.compile("country");
    xpathDocnumber = xpath.compile("doc-number");
    xpathDate = xpath.compile("date");
  } catch (XPathExpressionException e) {
    e.printStackTrace();
  }
}

然后我们重写方法getdoc。该方法现在获取一个document-id元素作为输入，并使用XPath表达式从中创建一个Biblio实例：

private static Biblio getdoc(Node element) throws XPathExpressionException {
  Biblio biblio = new Biblio();
  biblio.setCountry((String) xpathCountry.evaluate(element, XPathConstants.STRING));
  biblio.setDocnumber((String) xpathDocnumber.evaluate(element, XPathConstants.STRING));
  biblio.setDate((String) xpathDate.evaluate(element, XPathConstants.STRING));
  return biblio;
}

然后在main()方法中，使用XPath表达式仅提取所需的元素：

  NodeList nodeList = (NodeList) xpathDocId.evaluate(doc, XPathConstants.NODESET);
  List<Biblio> docList = new ArrayList<Biblio>();
  for (int i = 0; i < nodeList.getLength(); i++) {
    docList.add(getdoc(nodeList.item(i)));
  }

Python中文网

有 Java 编程相关的问题?

java XML读取具有不同段的相同标记

共 (3) 个答案

# 1 楼答案

# 2 楼答案

# 3 楼答案