如何在Python中将PDF转换为灰度

import locale from io import BytesIO import ghostscript as gs ENCO = locale.getpreferredencoding() STDOUT = BytesIO() STDERR = BytesIO() with open('adob_in.pdf', 'r') as infile: ARGS = f"""DUMMY -sOutputFile=adob_out.pdf -sDEVICE=pdfwrite -sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray -dNOPAUSE -dBATCH {infile.name}""" ARGSB = [arg.encode(ENCO) for arg in ARGS.split()] gs.Ghostscript(*ARGSB, stdout=STDOUT, stderr=STDERR) print(STDOUT.getvalue().decode(ENCO)) print(STDERR.getvalue().decode(ENCO))

GPL Ghostscript 9.52 (2020-03-19) Copyright (C) 2020 Artifex Software, Inc. All rights reserved. This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY: see the file COPYING for details. Processing pages 1 through 1. Page 1

GPL Ghostscript 9.52 (2020-03-19) Copyright (C) 2020 Artifex Software, Inc. All rights reserved. This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY: see the file COPYING for details. **** Error: Cannot find a 'startxref' anywhere in the file. Output may be incorrect. **** Error: An error occurred while reading an XREF table. **** The file has been damaged. This may have been caused **** by a problem while converting or transfering the file. **** Ghostscript will attempt to recover the data. **** However, the output may be incorrect. **** Error: Trailer dictionary not found. Output may be incorrect. No pages will be processed (FirstPage > LastPage). **** This file had errors that were repaired or ignored. **** Please notify the author of the software that produced this **** file that it does not conform to Adobe's published PDF **** specification. **** The rendered output from this file may be incorrect. GS>

1条回答

网友

1楼 · 发布于 2024-09-30 01:35:43

我不知道如何通过ghostscript实现，但是下面使用pdf2image和img2pdf的代码可以达到这个目的：

from os.path import join
from tempfile import TemporaryDirectory
from pdf2image import convert_from_path # https://pypi.org/project/pdf2image/
from img2pdf import convert # https://pypi.org/project/img2pdf/

with TemporaryDirectory() as temp_dir: # Saves images temporarily in disk rather than RAM to speed up parsing
    # Converting pages to images
    print("Parsing pages to grayscale images. This may take a while")
    images = convert_from_path(
        "your_pdf_path.pdf",
        output_folder=temp_dir,
        grayscale=True,
        fmt="jpeg",
        thread_count=4
    )

    image_list = list()
    for page_number in range(1, len(images) + 1):
        path = join(temp_dir, "page_" + str(page_number) + ".jpeg")
        image_list.append(path)
        images[page_number-1].save(path, "JPEG") # (page_number - 1) because index starts from 0

    with open("Gray_PDF.pdf", "bw") as gray_pdf:
        gray_pdf.write(convert(image_list))

    print("The new page is saved as Gray_PDF.pdf in the current directory.")

带有灰度图像的PDF文件将保存为同一目录中的Gray_PDF.PDF

说明：以下代码：

with TemporaryDirectory() as temp_dir: # Saves images temporarily in disk rather than RAM. This speeds up parsing
    # Converting pages to images
    print("Parsing pages to grayscale images. This may take a while")
    images = convert_from_path(
        "your_pdf_path.pdf",
        output_folder=temp_dir,
        grayscale=True,
        fmt="jpeg",
        thread_count=4
    )

执行以下任务：

将PDF页面转换为灰度图像
将其临时存储在目录中
创建PIL图像对象的列表images

现在输入以下代码：

    image_list = list()
    for page_number in range(1, len(images) + 1):
        path = join(temp_dir, "page_" + str(page_number) + ".jpeg")
        image_list.append(path)
        images[page_number-1].save(path, "JPEG") # (page_number - 1) because index starts from 0

在同一目录中将图像再次保存为page_1.jpeg、page_2.jpeg等。它还列出了这些新图像的路径

最后，输入以下代码：

    with open("Gray_PDF.pdf", "bw") as gray_pdf:
        gray_pdf.write(convert(image_list))

从先前创建的灰度图像创建名为Gray\u PDF的PDF，并将其保存在工作目录中

附加提示：如果您想使用OpenCV执行更多图像处理操作，则此方法为您提供了很大的灵活性，因为所有页面现在都是图像形式。只需确保所有操作都在第一个with语句中，即：

with TemporaryDirectory() as temp_dir: # Saves images temporarily in disk rather than RAM. This speeds up parsing

相关问题更多 >

编程相关推荐

热门问题

热门文章