我在寻找一种从这些文件类型中剥离图片的方法,这是我想出的解决方案。它遍历给定的目录结构,复制具有适当扩展名的任何文件,并将副本重命名为文件名.zip. 然后它在zip结构中导航,并提取具有适当扩展名的所有图片类型文件,并将它们重命名为原始文件名,并使用数字表示唯一性。最后,它删除它创建的提取目录树。在
从文本文档中提取图片是我工作的一部分,因此从长远来看,这实际上将为我的公司节省数千小时。在
所有代码都在下面,我真正想问的是:有没有更好的方法?有没有更有效的方法?它能扩展到包括其他格式吗?文本可以被提取成一个txt-在word和记事本上加载时间吗?在
这个解决方案可以在我的Linux机器上运行,我可以提取图片,但是我还没有在Windows系统上进行测试。在
#!/usr/bin/python3
import shutil
import os
import zipfile
def zipDoc(aFile,dirPath):
dotNDX = aFile.index(".") # position of the .
shortFN = aFile[:dotNDX] # name of the file before .
zipName = dirPath + shortFN + ".zip" # name and path of the file only .zip
shutil.copy2(dirPath + aFile, zipName) # copies all data from original into .zip format
useZIP = zipfile.ZipFile(zipName) # the usable zip file
return useZIP # returns the zipped file
def hasPicExtension(aFile): # if a file ends in a typical picture file extension, returns true
picEndings = [".jpeg",".jpg",".png",".bmp",".JPEG"".JPG",".BMP",".PNG"] # list of photo extensions
if aFile.endswith(tuple(picEndings)): # turn the list into a tuple, because .endswith accepts that
return True
else: # if it doesn't end in a picture extension
return False
def delDOCXEvidence(somePath): # removes the .docx file structures generated
##################################################################
# Working Linux code:
os.rmdir(somePath + "/word/media") # removes directory
os.rmdir(somePath + "/word") # removes more directory
##################################################################
##################################################################
# Untested windows code:
# os.rmdir(somePath + "\\\\word\\\\media") # removes directory
# os.rmdir(somePath + "\\\\word") #removes more directory
##################################################################
def delXLSXEvidence(somePath): # removes the .xlsx file structures generated
##################################################################
# Working Linux code:
os.rmdir(somePath + "/xl/media") # removes directory
os.rmdir(somePath + "/xl") # removes more directory
##################################################################
##################################################################
# Untested windows code:
# os.rmdir(somePath + "\\\\xl\\\\media") # removes directory
# os.rmdir(somePath + "\\\\xl") #removes more directory
##################################################################
def extractPicsFromDir(dirPath=""):
# when given a directory path, will extract all images from all .docx and .xlsx file types
if os.path.isdir(dirPath): # if the given path is a directory
for dirFile in os.listdir(dirPath): # loops through all files in the directory
dirFileName = os.fsdecode(dirFile) # strips out the file name
if dirFileName.endswith(".docx"):
useZIP = zipDoc(dirFile,dirPath) # turns it into a zip
picNum = 1 # number of pictures in file
for zippedFile in useZIP.namelist(): # loops through all files in the directory
if hasPicExtension(zippedFile): # if it ends with photo
useZIP.extract(zippedFile, path=dirPath) # extracts the picture to the path + word/media/
shutil.move(dirPath + str(zippedFile),dirPath + dirFileName[:dirFileName.index(".")] + " - " + str(picNum)) # moves the picture out
picNum += 1
delDOCXEvidence(dirPath) # removes the extracted file structure
os.remove(useZIP.filename) # removes zip file
# no evidence
if dirFileName.endswith(".xlsx"):
useZIP = zipDoc(dirFile,dirPath) # turns it into a zip
picNum = 1 # number of pictures in file
for zippedFile in useZIP.namelist(): # loops through all files in the directory
if hasPicExtension(zippedFile): # if it ends with photo
useZIP.extract(zippedFile, path=dirPath) # extracts the picture to the path + word/media/
shutil.move(dirPath + str(zippedFile),dirPath + dirFileName[:dirFileName.index(".")] + " - " + str(picNum)) # moves the picture out
picNum += 1
delXLSXEvidence(dirPath) # removes the extracted file structure
os.remove(useZIP.filename) # removes zip file
# no evidence
else:
print("Not a directory path!")
exit(1)
uDir = input("Enter your directory: ")
extractPicsFromDir(uDir)
Excel文件采用zip格式文件。它很容易从excel或docx文件中提取图像:
文件名:输入excel文件的位置
文件路径:保存提取图像的位置
相关问题 更多 >
编程相关推荐