使用Python将两个pdf列表一一组合

3条回答

网友

1楼 · 编辑于 2024-09-30 16:25:47

这应该可以正确地找到并整理所有要合并的文件；它仍然需要实际的.pdf合并代码。在

编辑：我添加了基于the pyPdf example code的pdf编写代码。它没有经过测试，但应该（尽我所能）正常工作。在

Edit2:意识到我有地图编号的交叉方式；重新调整它以合并正确的地图集。在

import collections
import glob
import re

# probably need to install this module -
#   pip install pyPdf
from pyPdf import PdfFileWriter, PdfFileReader

def group_matched_files(filespec, reg, keyFn, dataFn):
    res = collections.defaultdict(list)
    reg = re.compile(reg)
    for fname in glob.glob(filespec):
        data = reg.match(fname)
        if data is not None:
            res[keyFn(data)].append(dataFn(data))
    return res

def merge_pdfs(fnames, newname):
    print("Merging {} to {}".format(",".join(fnames), newname))

    # create new output pdf
    newpdf = PdfFileWriter()

    # for each file to merge
    for fname in fnames:
        with open(fname, "rb") as inf:
            oldpdf = PdfFileReader(inf)
            # for each page in the file
            for pg in range(oldpdf.getNumPages()):
                # copy it to the output file
                newpdf.addPage(oldpdf.getPage(pg))

    # write finished output
    with open(newname, "wb") as outf:
        newpdf.write(outf)

def main():
    matches = group_matched_files(
        "map*.pdf",
        "map(\d+)_(\d+).pdf$",
        lambda d: "{}".format(d.group(2)),
        lambda d: "map{}_".format(d.group(1))
    )
    for map,pages in matches.iteritems():
        merge_pdfs((page+map+'.pdf' for page in sorted(pages)), "merged{}.pdf".format(map))

if __name__=="__main__":
    main()

网友

2楼 · 编辑于 2024-09-30 16:25:47

我没有任何测试pdf可以尝试组合，但是我用一个cat命令对文本文件进行了测试。您可以尝试一下（我假设是基于unix的系统）：合并.py在

import os, re
files = os.listdir("/home/user/directory_with_maps/")
files = [x for x in files if re.search("map1_", x)]
while len(files) > 0:
    current = files[0]
    search = re.search("_(\d+).pdf", current)
    if search:
        name = search.group(1)
        cmd = "gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=FULLMAP_%s.pdf %s map2_%s.pdf" % (name, current, name)
        os.system(cmd)
    files.remove(current)

基本上，它遍历并获取maps1列表，然后遍历并假设文件是正确的，然后遍历数字。（我可以看到使用计数器来完成此操作，并用0填充以获得类似的效果）。在

首先测试gs命令，我只是从http://hints.macworld.com/article.php?story=2003083122212228抓取它。在

网友

3楼 · 编辑于 2024-09-30 16:25:47

PDF文件的结构与纯文本文件不同。简单地将两个PDF文件放在一起是行不通的，因为文件的结构和内容可能会被覆盖或损坏。当然，您也可以自己编写，但这需要相当长的时间，以及对PDF内部结构的深入了解。在

也就是说，我建议你调查一下{a1}。它支持您正在寻找的合并功能。在

相关问题更多 >

编程相关推荐

热门问题

热门文章