有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java如何使用poi从ms word(.doc)中读取格式化文本作为html文本?

我想将格式化后的文本作为类似(<;html>;<;b>;boldvalue<;b>;<;img src“link”>;<;/html>;)的html文本进行阅读此外,我想得到的图像使用图像标签链接。我正在使用poi poi有没有任何选项可以以html格式获取这样的数据


共 (1) 个答案

  1. # 1 楼答案

    试试这个

    HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream("D:\\temp\\seo\\1.doc"));
    
            WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
                    DocumentBuilderFactory.newInstance().newDocumentBuilder()
                            .newDocument());
            wordToHtmlConverter.processDocument(wordDocument);
            Document htmlDocument = wordToHtmlConverter.getDocument();
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            DOMSource domSource = new DOMSource(htmlDocument);
            StreamResult streamResult = new StreamResult(out);
    
            TransformerFactory tf = TransformerFactory.newInstance();
            Transformer serializer = tf.newTransformer();
            serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            serializer.setOutputProperty(OutputKeys.INDENT, "yes");
            serializer.setOutputProperty(OutputKeys.METHOD, "html");
            serializer.transform(domSource, streamResult);
            out.close();
    
            String result = new String(out.toByteArray());
            System.out.println(result);