java Apache PDFBox无法从PDF读取所有web链接

1 周，2 日 Questions & Answers 163

我正在尝试从PDF文件中提取所有超链接。我使用的是ApachePDFBox2.0.11版。我使用下面的代码片段，但在一些PDF文件中，我得到的页面注释大小为“0”。但在那个特定的页面上，超链接是可用的。请从https://drive.google.com/open?id=1GpbPsZr_OvunLBRr2iD5ElkNeKFPaRfy找到有问题的PDF文件。第2页包含超链接。所以请检查一下，帮我提取这些超链接

    PDDocument doc = null;
    doc = PDDocument.load(new File("C:\\Users\\A883\\Desktop\\AEM.01938-18.pdf"));
    for (int i = 0; i < doc.getNumberOfPages(); ++i)
    {
        PDPage page = doc.getPage(i);
        List<?> annots = page.getAnnotations();
        System.out.println("Size of annotations "+annots.size());
        for(Object o:annots){
            if(o instanceof PDAnnotationLink){
                System.out.println("Page "+(i+1)+" contains link.");
            }
        }
    }

Python中文网

有 Java 编程相关的问题?

java Apache PDFBox无法从PDF读取所有web链接

共 (0) 个答案