我的dataframe df包含600多个URL,我希望从元素中获取特定值。 此代码适用于以下情况:
ownerlist = []
for links in tqdm (df['Link'], leave=False, position=0):
ownersite = s.get(links, cookies=cookies)
owsoup = BeautifulSoup(ownersite.content, 'lxml')
owner = owsoup.find('input', {'id': 'GlobalBodyContent_InternalBodyContent_BodyContent_Owner'}).get('value')
ownerlist.append(owner)
#print(len(ownerlist),owner)
df['Owner'] = ownerlist
print(df)
但完成所有请求需要40分钟。我尝试了多线程方法,但无法使其工作。它运行得更快,但却取代了600多个项目,之后我的列表中只有2或3个项目。我试过:
owner = []
def mt(links):
ap = s.get(links, cookies=cookies)
apsoup = BeautifulSoup(ap.content, 'lxml')
ap1 = apsoup.find('input', {'id': 'GlobalBodyContent_InternalBodyContent_BodyContent_Owner'}).get('value')
#print(ap1)
owner.append(ap1)
def main():
for links in tqdm(df['Link']):
threadProcess = threading.Thread(name='simplethread', target=mt, args=[links])
threadProcess.daemon = True
threadProcess.start()
main()
如何使循环运行速度超过40分钟?谢谢
目前没有回答
相关问题 更多 >
编程相关推荐