pdfbox提取的java文本包含奇怪的问号符号来代替空格

1 周，1 日 Questions & Answers 127

当我尝试使用ApachePDFBOx2.0.18从PDF中提取文本时，输出如下所示：

我怎样才能避免那些问号符号？下面是我的pdf提取方法

 public static String getPDFContent(File pdfFile) throws IOException {
    PDDocument doc = null;
    String text = null;
    try {
        doc = PDDocument.load(pdfFile);
        text = new PDFTextStripper().getText(doc);
    }
    catch (Exception e) {
        logger.error("An exception occurred while extracting text from pdf using Apache PDFBox.");
        return null;
    }
    finally {
        if( doc != null )
        {
            doc.close();
        }
    }
    return text;
}

Python中文网

有 Java 编程相关的问题?

pdfbox提取的java文本包含奇怪的问号符号来代替空格

共 (0) 个答案