使用Python docx2txt从Word文档中提取图像

import docx2txt import os path ="whatever the path is" savepath = "wherever one would want to save this" files = [] for file in os.listdir(path): if file.endswith('.docx'): files.append(file) for i in range(len(files)): image = docx2txt.process(path+ "/" +files[i], savepath) ## this is the line that overwrites each new image

2条回答

网友

1楼 · 编辑于 2024-06-14 17:36:35

https://github.com/ankushshah89/python-docx2txt/blob/c94663234d2882aa75932f9c9973eb5a804df13b/docx2txt/docx2txt.py#L72

它指定目录，因此

for i in range(len(files)):
    image = docx2txt.process(path+ "/" +files[i], savepath) ## this is the line that overwrites each new image

您可以指定一个单独的保存路径

for i in range(len(files)):
    savepath=savepath+str(i)
    image = docx2txt.process(path+ "/" +files[i], savepath) ## this is the line that overwrites each new image

网友

2楼 · 编辑于 2024-06-14 17:36:35

我最终的解决方案是：我最终使用以下方法将每个图像保存到与word文档中同样存在的字符串对应的文件夹中：

import docx2txt
import os

path ="path"
savepath = "savepath"

## Collects name information from the word files

files = []
correctedfiles = []
for file in os.listdir(path):
    if file.endswith('.docx'):
        files.append(file)

## Checking above    

for x in range(len(files)):
    print(files[x])

## Makes equal folders as exist questions and names them the by their ID

for i in range(len(files)):
    os.chdir(savepath)
    textresult = docx2txt.process(path + "/" + files[i])
    print(textresult)
    correctresult = textresult.replace('June 2019 ', '')
    os.system('mkdir ' + correctresult)

## Saves images based on name in folders     

for i in range(len(files)):
    textresult = docx2txt.process(path + "/" + files[i])
    correctresult = textresult.replace('June 2019 ', '')
    image = docx2txt.process(path+ "/" +files[i], savepath + '/' + correctresult)

在word文档的标题（2019年6月）中，有一些额外的位处理多余的单词

相关问题更多 >

编程相关推荐

热门问题

热门文章