pdf formulas将pdf的公式转储为“formulas”子文件夹中的png文件。
pdfformulas的Python项目详细描述
- 用法:pdfformulas.py[-h][-dxmin dxmin][-frompage frompage]
- [-顶部页面][-页面页面][-公式化ID公式化ID] [–统计] Pdfile
将pdf的公式作为png文件转储到formulas子文件夹中。这个 子文件夹formulas已创建(如果尚未创建)。PDF内容必须是 可作为文本访问。
- 位置参数:
- pdf file pdf文件,用于解析和转储的公式
- 可选参数:
-h, --help show this help message and exit --dxmin DXMIN Additional left margin, which defines what is normal text. If the text before a formula is the beginning of a paragraph it might start a little indented. In this case it helps to move dxmin to the right. Units are those used in the PDF. Try 10. --frompage FROMPAGE PDF page number to start with. --topage TOPAGE PDF page number to stop at. --page PAGE PDF page number --formulaid FORMULAID The regular expression by which a formula is found. Formulas are recognized by their ID on the right. The regular expression used is:: r’^s*(d*.d*)s* ‘ e.g.:: (2.13) To find the rectangle comprising the formula the text before and after is located, which begins on the left of the page (dxmin). The formula is assumed to be indented with regard to normal text. --stats Only print (formula,page)-refs statistics. This tells which formulas are most often referenced in normal text and are thus likely the most important ones.
需要:枕头、pymupdf(需要安装兼容的mupdf)、pdfminer 安装:libmupdf和pymupdf需要预先安装。