eclipse如何在Java源代码中使用TikaCLI功能?
我正在尝试使用Apache Tika从office文档中提取嵌入文件。使用Tika CLI(cmd),一切都运行良好。但我必须在Eclipse的Java源代码中集成它
所以我所做的是:
public static void saveEmbedds(String inputfile, String outputfile) throws Exception{
try{
String[] arguments = new String[]{"-z", "--extract-dir=" + removeExtension(outputfile), inputfile};
System.out.println("Using TIKA CLI to dedect embedded Files. Target Directory: "+ removeExtension(outputfile));
TikaCLI.main(arguments);
}
catch(Exception e){
logger.info("Exception in saveEmbedds, during search in File: " + inputfile + "\r\nDetails: " + e);
}
}
这实际上适用于每种文件类型,除了.pptx
。当inputfile是一个。pptx文件,它会产生很多错误。使用cmd同样有效
12.04.2016 15:31:33 945 Exception in thread "main" java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFTextShape.getTextType()Lorg/apache/poi/xslf/usermodel/Placeholder;
12.04.2016 15:31:33 945 at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.extractContent(XSLFPowerPointExtractorDecorator.java:154)
12.04.2016 15:31:33 945 at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:88)
12.04.2016 15:31:33 945 at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
12.04.2016 15:31:33 945 at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112)
12.04.2016 15:31:33 945 at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
12.04.2016 15:31:33 945 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
12.04.2016 15:31:33 945 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
12.04.2016 15:31:33 945 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
12.04.2016 15:31:33 945 at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:190)
12.04.2016 15:31:33 945 at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:491)
12.04.2016 15:31:33 945 at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:144)
有没有更好的方法来使用Apache Tika CLI的功能我还尝试了ExtractEmbeddedFiles的示例代码,但我没有为嵌入的.ppt
文件工作
共 (0) 个答案