需要将刮取的数据写入csv文件（线程）

from download1 import download import threading,lxml.html def getInfo(initial,ending): for Number in range(initial,ending): Fields = ['country', 'area', 'population', 'iso', 'capital', 'continent', 'tld', 'currency_code', 'currency_name', 'phone', 'postal_code_format', 'postal_code_regex', 'languages', 'neighbours'] url = 'http://example.webscraping.com/places/default/view/%d'%Number html=download(url) tree = lxml.html.fromstring(html) results=[] for field in Fields: x=tree.cssselect('table > tr#places_%s__row >td.w2p_fw' % field)[0].text_content() results.append(x)#should i start writing here? downloadthreads=[] for i in range(1,252,63): #create 4 threads downloadThread=threading.Thread(target=getInfo,args=(i,i+62)) downloadthreads.append(downloadThread) downloadThread.start() for threadobj in downloadthreads: threadobj.join() #end of each thread print "Done"

1条回答

网友

1楼 · 发布于 2024-09-28 22:24:46

我认为您应该考虑使用某种queuing或线程池。Thread pools非常有用，如果你想创建几个线程（不是4个，我想你会使用4个以上的线程，但一次4个线程）。你知道吗

队列技术的一个例子可以在here中找到。你知道吗

当然，您可以用线程id来标记文件，例如：“results\u 1.txt”、“results\u 2.txt”等等。然后，可以在所有线程完成后合并它们。你知道吗

您可以使用锁、监视器等基本概念，但我不是它们的忠实粉丝。锁定的一个例子可以找到here

相关问题更多 >

编程相关推荐

热门问题

热门文章