我试图在Ubuntu操作系统中使用Python3.5将程序的输出写入文件。下面是我在尝试多线程之前首先尝试的内容
from fuzzywuzzy import process, fuzz
import ast
def people(email):
#Checking the names of people with fuzzywuzzy library of python
return([returns result])
writel = open (r'output.csv','w',encoding='utf-8',errors='ignore')
with open ('emailfile.txt','r',encoding='ascii',errors='ignore') as Filepointer:
result = []
for line in Filepointer.readlines():
count += 1
data = people(line.strip())
if data is not "":
result.append(data)
for data in result:
writel.write(str(data) + "\n")
writel.close()
然后,我尝试使用以下代码在python 3上执行多线程:
from fuzzywuzzy import process, fuzz
import ast
from concurrent.futures import ThreadPoolExecutor
import threading
global FinalOutput
def people(email):
#Checking the names of people with fuzzywuzzy library of python
FinalOutput.append([appends returned result])
print (FinalOutput)
return
threads = []
writel = open (r'output.csv','w',encoding='utf-8',errors='ignore')
count = 0
pool = ThreadPoolExecutor(max_workers=10)
with open ('emailfile.txt','r',encoding='ascii',errors='ignore') as Filepointer:
for line in Filepointer.readlines():
pool.submit(people,line.strip())
pool.shutdown(wait=True)
for data in FinalOutput:
writel.write(str(data) + "\n")
writel.close()
以上代码产生以下错误:
Segmentation fault (core dumped)
我在StackOverflow中查看了与此问题相关的线程,但没有找到解决方案。我还是会犯同样的错误。
好心的,让我知道我需要做什么使代码运行
Python有一个很棒的并行化工具,叫做多处理池。它不是多线程,而是并行化,这似乎是您的意图。我们要做的是使
people
返回一个值,而不是将结果附加到全局变量:从那里我们可以创建一个
Pool
并调用它的map
函数,该函数自动分配iterable返回的值,并按它们在iterable中的顺序在列表中返回:您还可以研究一个名为
joblib
的包,它有一个函数,可以以更整洁、更灵活的方式实现这一点相关问题 更多 >
编程相关推荐