PDF缩小工具

pdfminif的Python项目详细描述


PDF完善

Build Status

pdfminify旨在在直接操作PDF时重新压缩PDF图像 级别(即不使用PostScript重新压缩)。它解析PDF文件, 散列所有图像引用,重新链接重复的资源(即 相同的MD5哈希值),也可以重新压缩 以无损压缩方式存储以使用JPEG。它试图计算 图像的物理范围(根据包含方法, 可以有点凌乱)然后才能计算出实际的图像分辨率。 如果它超过了给定的目标分辨率,它也可以重新采样图像(即。, 在将它们重新整合到目标中之前,使用ImageMagick重新缩放它们 PDF格式。在

特别是,我编写这个软件是因为PDF是由libcairo生成的 出口是huange。所使用的图像只包含十几次 使用无损压缩。因此,我使用pdfminify来减少 稍后文件大小。在

pdfminify的另一个用途是它能够将PDF转换成PDF/A-1b 兼容的PDF文件。既然这真的很难做到, 对于生成的PDF没有任何保证--请检查 如果结果仍然与源版本的行为相同,则返回。在

最后,pdfminify能够对PDF文件进行数字签名。为了这个你会的 需要一个X.509证书和相应的密钥对。签名将是 包括在PDF格式的横幅与元数据和复杂的PDF 读者将能够验证PDF没有被篡改。在

要求

pdfminify至少需要python3.5和llpdf Python package至少v0.0.4。而且 使用ImageMagick的“标识”和“转换”实用程序。它使用前者 确定图像的宽度、高度、色彩空间和每个分量的位数 文件和后者转换来自PNM的图像(内部格式 pdfminify能够本机写入)到JPEG。在

致谢

pdfminify使用christophedelord的玩具解析器生成器(TPG) (http://cdsoft.fr/tpg/)。包括在内(tpg.py公司文件)并根据 GNU LGPLv2.1或任何更高版本。尽管它的名字与玩具相去甚远。在 事实上,它是我使用过的最漂亮的解析器生成器 使得语法和语法分析异常简单,即使对于没有语法和语法分析的人来说也是如此 以前的解析经验。如果需要解析并使用Python,TPG是the 选择我推荐的解决方案。说真的,太棒了。过来看。 版权和许可证详细信息可在外部_许可证.md一

使用

$ pdfminify
usage: pdfminify [-h] [-d dpi] [-j] [--jpeg-quality percent]
                 [--no-downscaling] [--cropbox x,y,w,h]
                 [--unit {cm,inch,mm,native}] [--one-bit-alpha]
                 [--remove-alpha] [--background-color color]
                 [--strip-metadata] [--saveimgdir path] [--raw-output]
                 [--pretty-pdf] [--no-xref-stream] [--no-object-streams]
                 [--pdfa-1b] [--color-profile iccfile] [--sign-cert certfile]
                 [--sign-key keyfile] [--sign-chain pemfile] [--signer name]
                 [--sign-location hostname] [--sign-contact-info infotext]
                 [--sign-reason reason] [--sign-page pageno]
                 [--sign-font pfbfile] [--sign-pos x,y] [--embed-payload path]
                 [--no-pdf-tagging] [--decompress-data] [--analyze]
                 [--dump-xref-table] [--no-filters] [-v]
                 pdf_in pdf_out

Minifies PDF files, can crop them, convert them to PDF/A-1b and sign them
cryptographically.

positional arguments:
  pdf_in                Input PDF file.
  pdf_out               Output PDF file.

optional arguments:
  -h, --help            show this help message and exit
  -d dpi, --target-dpi dpi
                        Default resoulution to which images will be resampled
                        at. Defaults to 150 dots per inch (dpi).
  -j, --jpeg-images     Convert images to JPEG format. This means that lossy
                        compression is used that however often yields a much
                        higher compression ratio.
  --jpeg-quality percent
                        When converting images to JPEG format, the parameter
                        gives the compression quality. It is an integer from
                        0-100 (higher is better, but creates also larger
                        output files).
  --no-downscaling      Do not apply downscaling filter on the PDF, take all
                        images as they are.
  --cropbox x,y,w,h     Crop pages by additionally adding a /CropBox to all
                        pages of the PDF file. Pages will be cropped at offset
                        (x, y) to a width (w, h). The unit in which offset,
                        width and height are given can be specified using the
                        --unit parameter.
  --unit {cm,inch,mm,native}
                        Specify the unit of measurement that is used for input
                        and output. Can be any of cm, inch, mm, native,
                        defaults to native. One native PDF unit equals 1/72th
                        of an inch.
  --one-bit-alpha       Force all alpha channels in images to use a color
                        depth of one bit. This will make transparent images
                        have rougher edges, but saves additional space.
  --remove-alpha        Entirely remove the alpha channel (i.e., transparency)
                        of all images. The color which with transparent areas
                        are replaced with can be specified using the
                        --background-color command line option.
  --background-color color
                        When removing alpha channels, specifies the color that
                        should be used as background. Defaults to white.
                        Hexadecimal values can be specified as well in the
                        format '#rrggbb'.
  --strip-metadata      Strip metadata inside PDF objects that is not strictly
                        required, such as /PTEX.* entries inside object
                        content.
  --saveimgdir path     When specified, save all handled images as individual
                        files into the specified directory. Useful for image
                        extraction from a PDF as well as debugging.
  --raw-output          When saving images externally, save them in exactly
                        the format in which they're also present inside the
                        PDF. Note that this will produce raw image files in
                        some cases which won't have any header (but just
                        contain pixel data). Less useful for image extraction,
                        but can make sense for debugging.
  --pretty-pdf          Write pretty PDF files, i.e., format all dictionaries
                        so they're well-readable regarding indentation.
                        Increases required file size a tiny bit and increases
                        generation time of the PDF a little, but produces
                        easily debuggable PDFs.
  --no-xref-stream      Do not write the XRef table as a XRef stream, but
                        instead write a classical PDF XRef table and trailer.
                        This will increase the file size a bit, but might
                        improve compatibility with old PDF readers (XRef
                        streams are supported only starting with PDF 1.5).
                        XRef-streams are a prerequisite to object stream
                        compression, so if XRef-streams are disabled, so will
                        also be object streams (e.g, --no-object-streams is
                        implied).
  --no-object-streams   Do not compress objects into object-streams. Object
                        stream compression is introduced with PDF 1.5 and
                        means that multiple simple objects (without any stream
                        data) are concatenated together and compressed
                        together into one large stream object.
  --pdfa-1b             Try to create a PDF/A-1b compliant PDF document.
                        Implies --no-xref-stream, --no-object-streams,
                        --remove-alpha, removes transpacency groups and adds a
                        PDF/A entry into XMP metadata.
  --color-profile iccfile
                        When creating a PDF/A-1b PDF, gives the Internal Color
                        Consortium (ICC) color profile that should be embedded
                        into the PDF as part of the output intent. When
                        omitted, it defaults to the sRGB IEC61966 v2 "black
                        scaled" profile which is included within pdfminify.
  --sign-cert certfile  pdfminify can additionally cryptographically sign your
                        result PDF file with an X.509 certificate and
                        corresponding key. This parameter specifies the
                        certificate filename.
  --sign-key keyfile    This parameter specifies the key filename, also in PEM
                        format.
  --sign-chain pemfile  When signing a PDF, this gives the PEM-formatted
                        certificate chain file. Can be omitted if this should
                        not be included in the PKCS#7 signature.
  --signer name         The name of the person responsible for signing the
                        document.
  --sign-location hostname
                        The location of the signing; usually this is the
                        hostname of the computer that the signature is
                        generated on.
  --sign-contact-info infotext
                        A contact information field under which the signer can
                        be reached. Usually a phone number of email address.
  --sign-reason reason  The reason why the document was signed.
  --sign-page pageno    Page number on which the signature should be
                        displayed. Defaults to 1.
  --sign-font pfbfile   To be able to include metadata text in the signature
                        form, a T1 font must be included into the PDF. This
                        gives the filename of the font that is to be used for
                        that purpose. Must be in PFB (PostScript Font Binary)
                        file format and will be included in the result PDF in
                        full (i.e., not reduced to the glyphs that are
                        actually needed). Defaults to the Bitstream Charter
                        Serif font that is included within pdfminify.
  --sign-pos x,y        Determines where the signature will be placed on the
                        page. Units are determined by the --unit variable and
                        the position is relative to lower left corner.
  --embed-payload path  Embed an opaque file as a payload into the PDF as a
                        valid PDF object. This is useful only if you want to
                        place an easter egg inside your PDF file.
  --no-pdf-tagging      Omit tagging the PDF file with a reference to
                        pdfminify and the used version.
  --decompress-data     Decompress all FlateDecode compressed data in the
                        file. Useful only for debugging.
  --analyze             Perform an analysis of the read PDF file and dump out
                        useful information about it.
  --dump-xref-table     Dump out the XRef table that was read from the input
                        PDF file. Mainly useful for debugging.
  --no-filters          Do not apply any filters on the source PDF whatsoever,
                        just read it in and write it back out. This is useful
                        to reformat a PDF and/or debug the PDF reader/writer
                        facilities without introducing other sources of
                        malformed PDF generation.
  -v, --verbose         Show verbose messages during conversation. Can be
                        specified multiple times to increase log level.

pdfminify version 0.2.1; llpdf version: 0.0.4

虫子

PDF本来就是一种凌乱的格式,解析它真的不漂亮。我已经 只执行我需要实现的,以便完成我的工作。我是 当然,有很多例子表明pdfminify显然不起作用,或者 创建损坏的输出PDF。请随时纠正这些错误并发送 拉取请求。我认为这是一个真正有用的工具,因此会很好 支持更多种类的PDF,而不仅仅是我碰巧生成的那些。在

如果您遇到一个问题,但由于您不知道而无法解决它 关于Python,PDF(或两者之一)的内容已经足够了,如果您 Giton集线器报告。但是由于时间不够,我不能保证 可以解决它--说实话,PDF太复杂了,我甚至不确定 找到问题所在。在任何情况下,一定要包括一个最小的例子 演示bug报告中问题的PDF文件。在

许可证

pdfminify是在GNU GPL v3下获得许可的(除了外部组件as 它拥有自己的许可证)。更高版本的GPL是显式的 排除。在

TPG(Toy Parser Generator)属于其各自的许可(请参见 外部_许可证.md). 在

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java无法将自定义数据类型转换为字符串?   JavaLog4j和appender,这个Log4j定义正确吗?   用于换行的java Android Eclipse拆分   与某个方法关联的java启用/禁用JButton   java小部件列表视图加载视图   java国家/地区名称中的正则表达式   从Java调用Kotlin时,如何获取错误的行号?   java将视图传递给AsyncTask以访问findViewById   java SQL性能:多个绑定还是绑定到一个SQL变量以供重用?   BluetoothAdapter上的安卓 Java NullPointerException。isEnabled()   在clojure中取消引用java方法   JAVA网SocketException:IP_添加_成员身份失败(硬件筛选器不足?)   java从类对象的方法接收nullpointer异常   java使用for循环创建多个对象   java无法使用NTLM身份验证apache camel cxf   java Eclipse不喜欢@Override注释   java Spark SQL模拟红移(Oracle)“系统日期”或“当前日期”