有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java XML读取具有不同段的相同标记

下面是xml文件

<maindata>
        <publication-reference>
          <document-id document-id-type="docdb">
            <country>US</country>
            <doc-number>9820394ASD</doc-number>
            <date>20111101</date>
          </document-id>
          <document-id document-id-type="docmain">
            <doc-number>9820394</doc-number>
            <date>20111101</date>
          </document-id>
        </publication-reference>
</maindata>

我想提取type=“docmain”下的<doc-number>标记值 下面是我的java代码,在执行其提取时9829394ASD而不是9820394

public static void main(String[] args) {
        String filePath ="D:/bs.xml";
        File xmlFile = new File(filePath);
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder;
        try {
            dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(xmlFile);
            doc.getDocumentElement().normalize();
            System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
            NodeList nodeList = doc.getElementsByTagName("publication-reference");
            List<Biblio> docList = new ArrayList<Biblio>();
            for (int i = 0; i < nodeList.getLength(); i++) {
                docList.add(getdoc(nodeList.item(i)));
            }

        } catch (SAXException | ParserConfigurationException | IOException e1) {
            e1.printStackTrace();
        }
    }
    private static Biblio getdoc(Node node) {
           Biblio bib = new Biblio();
        if (node.getNodeType() == Node.ELEMENT_NODE) {
            Element element = (Element) node;
            bib.setCountry(getTagValue("country",element));
            bib.setDocnumber(getTagValue("doc-number",element));
            bib.setDate(getTagValue("date",element));          
        }
        return bib;
    }

让我知道我们如何检查其docmain或doctype的类型,仅当类型为docmain时才应提取,否则应离开元素

添加了getTagValue方法

private static String getTagValue(String tag, Element element) {
        NodeList nodeList = element.getElementsByTagName(tag).item(0).getChildNodes();
        Node node = (Node) nodeList.item(0);
        return node.getNodeValue();
    }

共 (3) 个答案

  1. # 1 楼答案

    谢谢你的帮助,下面是代码

    String Number = xPath.compile("//publication-reference//document-id[@document-id-type=\"docmain\"]/doc-number").evaluate(xmlDocument);
    
  2. # 2 楼答案

    可以使用DOM and XPath API使用followingXPath检索该值

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document doc = builder.parse(new File(...) );
        XPathFactory xPathfactory = XPathFactory.newInstance();
        XPath xpath = xPathfactory.newXPath();
        XPathExpression expr = xpath.compile("//document-id[@document-id-type=\"docmain\"]/doc-number/text()");
        String value = expr.evaluate(doc);
    
  3. # 3 楼答案

    更改方法getdoc(),使其仅为'docmain'类型创建Biblio对象

    private static Biblio getdoc(Node node) {
      Biblio bib = null;
      if (node.getNodeType() == Node.ELEMENT_NODE) {
        Element element = (Element) node;
        String type = element.getAttribute("document-id-type");
        if(type != null && type.equals("docmain")) {
          bib = new Biblio();
          bib.setCountry(getTagValue("country",element));
          bib.setDocnumber(getTagValue("doc-number",element));
          bib.setDate(getTagValue("date",element));          
        }
      }
      return bib;
    }
    

    然后,在main方法中,如果getdoc()结果不为空,则只应将其添加到列表中:

    for (int i = 0; i < nodeList.getLength(); i++) {
      Biblio biblio = getdoc(nodeList.item(i));
      if(biblio != null) {
        docList.add(biblio);
      }
    }
    

    更新: 好吧,这太可怕了,抱歉。您应该真正了解一点XPath。 我试着用XPath表达式重写它

    首先,我们需要四个XPath表达式。一个是提取一个节点列表,其中包含类型为docmain的所有document-id元素

    其XPath表达式是:/maindata/publication-reference/document-id[@document-id-type='docmain'](上下文中的整个XML文档)

    这里[]中的谓词确保只提取类型为docmaindocument-id元素

    然后对于document-id元素中的每个字段(以document-id元素作为上下文):

    • 国家:country
    • 文档编号:doc-number
    • 日期:date

    我们使用静态初始值设定项:

    private static XPathExpression xpathDocId;
    private static XPathExpression xpathCountry;
    private static XPathExpression xpathDocnumber;
    private static XPathExpression xpathDate;
    
    static {
      try {
        XPath xpath = XPathFactory.newInstance().newXPath();
        // Context is the whole document. Find all document-id elements with type docmain
        xpathDocId = xpath.compile("/maindata/publication-reference/document-id[@document-id-type='docmain']");
    
        // Context is a document-id element. 
        xpathCountry = xpath.compile("country");
        xpathDocnumber = xpath.compile("doc-number");
        xpathDate = xpath.compile("date");
      } catch (XPathExpressionException e) {
        e.printStackTrace();
      }
    }
    

    然后我们重写方法getdoc。该方法现在获取一个document-id元素作为输入,并使用XPath表达式从中创建一个Biblio实例:

    private static Biblio getdoc(Node element) throws XPathExpressionException {
      Biblio biblio = new Biblio();
      biblio.setCountry((String) xpathCountry.evaluate(element, XPathConstants.STRING));
      biblio.setDocnumber((String) xpathDocnumber.evaluate(element, XPathConstants.STRING));
      biblio.setDate((String) xpathDate.evaluate(element, XPathConstants.STRING));
      return biblio;
    }
    

    然后在main()方法中,使用XPath表达式仅提取所需的元素:

      NodeList nodeList = (NodeList) xpathDocId.evaluate(doc, XPathConstants.NODESET);
      List<Biblio> docList = new ArrayList<Biblio>();
      for (int i = 0; i < nodeList.getLength(); i++) {
        docList.add(getdoc(nodeList.item(i)));
      }