我对多线程很陌生,所以如果是基本的,我很抱歉。我有一些OCRs图像文件的功能,我想多线程的任务。函数不返回任何内容,但只保存OCR数据集的文本。代码如下:
start_time = time.time()
path = 'C:\\Users\\RNCZF01\\Documents\\Cameron-Fen\\Economics-Projects\\Patent-project\\similarity\\Patents\\OCR-test'
listfiles = os.listdir(path)
filterfiles = [p for p in listfiles if p[-4:] == '.tif']
pool = Pool(processes=2)
result = pool.map(OCRimage,filterfiles)
pool.close()
pool.join()
print("--- %s seconds ---" % (time.time() - start_time))
当我运行代码时,它似乎被卡住了pool.map()
。我运行了30分钟,这比试验过程花费的时间要长得多,而且它不是单输出的。我测试了我的函数OCRimage,但它似乎没有一次进入函数(使用print(1)
作为OCRimage代码的第一行)。我在想是否有人能帮我。谢谢
卡梅伦
编辑(添加了OCRimage函数):
OCRimage函数如下所示:
def OCRimage(f):
#This runs the magick bash script which splits a multi-image tif into multiple single image tiffs
process = subprocess.Popen(["magick", path + "\\" + f, path + "\\temp\\%d.tif"], shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
print(process.communicate()[0])
#finds the number of pages for each tiff file (this might not be necassary but the all files in directory python command could access files randomly)
max1 = -1
for filename in os.listdir(path+'\\temp'):
if (max1 < int(filename[0:-4])):
max1 = int(filename[0:-4])
max1 = max1 + 1
text = ""
for each in range(0,max1):
im = Image.open(path + "\\temp\\"+ str(each) + ".tif")
text = text + pytesseract.image_to_string(im)
with open(path + "\\result\\OCR-"+f[0:-4]+".txt", 'w') as file:
file.write(text)
for f in os.listdir(path+'\\temp'):
os.remove(path + '\\temp\\' + f)
编辑2:这是所有的进口货
import time
import subprocess
import os
import pytesseract
from PIL import Image
from multiprocessing import Pool
import multiprocessing
countcpus = multiprocessing.cpu_count()
编辑3:
只运行OCRimage(f)本身可以很好地工作。与多线程代码不同,我只使用以下代码:
path = 'C:\\Users\\RNCZF01\\Documents\\Cameron-Fen\\Economics-Projects\\Patent-project\\similarity\\Patents\\OCR-test'
for p in os.listdir(path):
OCRimage(p)
目前没有回答
相关问题 更多 >
编程相关推荐