有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java生成XSLT转换文件以识别ascii字符

我有一个XSLT,它将html表转换为CSV,定义如下

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                xmlns:fo="http://www.w3.org/1999/XSL/Format" >
    <xsl:output method="text" omit-xml-declaration="yes" indent="yes"/>
    <xsl:template match="/">
         <xsl:for-each select="//tr">
            <xsl:for-each select="td">
                <xsl:if test="position() > 1">,</xsl:if>
                <xsl:value-of select="."/>
            </xsl:for-each>
         <xsl:text>&#xA;</xsl:text>
    </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

但我现在遇到的问题是,这些表的标记是用ascii码编写的

样本输入:

&lt;table&gt;&lt;tr&gt;
        &lt;th&gt;Order ID&lt;/th&gt;
        &lt;th&gt;Item ID&lt;/th&gt;
        &lt;th&gt;Participant ID&lt;/th&gt;
        &lt;th&gt;Status&lt;/th&gt;
        &lt;th&gt;Shipping Provider&lt;/th&gt;
        &lt;th&gt;Tracking Number&lt;/th&gt;
        &lt;th&gt;Shipped Date&lt;/th&gt;
        &lt;th&gt;Shipping Method&lt;/th&gt;&lt;/tr&gt;
            &lt;tr&gt;
            &lt;td align="center"&gt; Choice_DJ4&lt;/td&gt;
            &lt;td align="center"&gt; 4&lt;/td&gt;
            &lt;td align="center"&gt; DXM09902&lt;/td&gt;
            &lt;td align="center"&gt; Shipped&lt;/td&gt; 
            &lt;td align="center"&gt; USPS&lt;/td&gt; 
            &lt;td align="center"&gt; &lt;/td&gt; 
            &lt;td align="center"&gt; 04/13/2017&lt;/td&gt; 
            &lt;td align="center"&gt; Standard Ground&lt;/td&gt; 
            &lt;/tr&gt;
    &lt;/table&gt;

我的问题是,有没有一种方法可以让xsl文件将ascii码识别为它们的预期字符。 更新: 这是我的java代码

String data = readFile("config/email.xml");

    System.out.println("Data: \n" + data);
    InputSource is = new InputSource(new StringReader(data));

    String configFile = "config/email-xslt.xsl";

    File stylesheet = new File(configFile);

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document document = builder.parse(is);

    StreamSource stylesource = new StreamSource(stylesheet);
    Transformer transformer = TransformerFactory.newInstance()
            .newTransformer(stylesource);
    Source source = new DOMSource(document);
    StringWriter sw = new StringWriter();
    Result outputTarget = new StreamResult(sw);

    transformer.transform(source, outputTarget);
    data = sw.toString();
    System.out.println("Output: " + data);

共 (2) 个答案

  1. # 1 楼答案

    终于解决了这个问题。。。使用org.apache.commons.lang3.StringEscapeUtils.unescapeJava(str);

    我的xsl文件和数据输入(config/email.xml)仍然与OP中的一样,但在传递给xsl转换器之前,我必须修改java代码以取消对这些字符的扫描

    String data = readFile("config/email.xml");
    data = StringEscapeUtils.unescapeXml(data);
    System.out.println("Data: \n" + data);
    InputSource is = new InputSource(new StringReader(data));
    
    String configFile = "config/email-xslt.xsl";
    
    File stylesheet = new File(configFile);
    
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document document = builder.parse(is);
    
    StreamSource stylesource = new StreamSource(stylesheet);
    Transformer transformer = TransformerFactory.newInstance()
         .newTransformer(stylesource);
    Source source = new DOMSource(document);
    StringWriter sw = new StringWriter();
    Result outputTarget = new StreamResult(sw);
    
    transformer.transform(source, outputTarget);
    data = sw.toString();
    System.out.println("Output: " + data);
    
  2. # 2 楼答案

    使用XSLT3.0,可以使用^{}加载文本,^{}取消实体的scape,使用^{}解析XML字符串

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        version="3.0">
        <xsl:output method="text" omit-xml-declaration="yes" indent="yes"/>
        <xsl:template match="/">
            <! first, load the contents of the document (adjust path to your document)  >
            <xsl:variable name="input" select="unparsed-text('table.txt')" as="item()"/>
            <! second, unescape the angle bracket entities  >
            <xsl:variable name="table-text" select="parse-xml-fragment($input)" as="item()"/>
            <! third, parse the serialized XML string  >
            <xsl:variable name="table" select="parse-xml($table-text)" as="item()"/>
            <xsl:for-each select="$table//tr">
                <! a more simplified way of generating the CSV for each row  >
                <xsl:value-of select="td" separator=","/>
                <xsl:text>&#xA;</xsl:text>
            </xsl:for-each>
        </xsl:template>
    </xsl:stylesheet>