有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

eclipse如何在Java源代码中使用TikaCLI功能?

我正在尝试使用Apache Tika从office文档中提取嵌入文件。使用Tika CLI(cmd),一切都运行良好。但我必须在Eclipse的Java源代码中集成它

所以我所做的是:

public static void saveEmbedds(String inputfile, String outputfile) throws Exception{
    try{
        String[] arguments = new String[]{"-z", "--extract-dir=" + removeExtension(outputfile), inputfile};
        System.out.println("Using TIKA CLI to dedect embedded Files. Target Directory: "+ removeExtension(outputfile));
        TikaCLI.main(arguments);
    }
    catch(Exception e){
        logger.info("Exception in saveEmbedds, during search in File: " + inputfile + "\r\nDetails: " + e);
    }

}

这实际上适用于每种文件类型,除了.pptx。当inputfile是一个。pptx文件,它会产生很多错误。使用cmd同样有效

12.04.2016 15:31:33 945     Exception in thread "main" java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFTextShape.getTextType()Lorg/apache/poi/xslf/usermodel/Placeholder; 
12.04.2016 15:31:33 945     at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.extractContent(XSLFPowerPointExtractorDecorator.java:154) 
12.04.2016 15:31:33 945     at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:88) 
12.04.2016 15:31:33 945     at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110) 
12.04.2016 15:31:33 945     at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112) 
12.04.2016 15:31:33 945     at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87) 
12.04.2016 15:31:33 945     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) 
12.04.2016 15:31:33 945     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) 
12.04.2016 15:31:33 945     at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) 
12.04.2016 15:31:33 945     at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:190) 
12.04.2016 15:31:33 945     at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:491) 
12.04.2016 15:31:33 945     at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:144) 

有没有更好的方法来使用Apache Tika CLI的功能我还尝试了ExtractEmbeddedFiles的示例代码,但我没有为嵌入的.ppt文件工作


共 (0) 个答案