有什么方法可以在线程处理时用BeautifulSoup解析吗？

https://www.xtip.co.uk/en/?r=bets/xtra&group=476641&game=312053910 https://www.xtip.co.uk/en/?r=bets/xtra&group=476381&game=312057618 ... https://www.xtip.co.uk/en/bets/xtra.html?group=477374&game=312057263

def scrape_for_info(url): scrape = CP_GetOdds(url) for x in range(scrape.GameRange()): sql_str = "INSERT INTO Scraped_Odds ('" sql_str += str(scrape.Time()) + "', '" sql_str += str(scrape.Text(x)) + "', '" sql_str += str(scrape.HomeTeam()) + "', '" sql_str += str(scrape.Odds1(x)) + "', '" sql_str += str(scrape.Odds2(x)) + "', '" sql_str += str(scrape.AwayTeam()) + "')" cursor.execute(sql_str) conn.commit()

3条回答

网友

1楼 · 编辑于 2024-09-26 18:07:20

使用multiprocessing可以考虑使用Queue。你知道吗

通常您会创建两个作业，一个创建URL，另一个使用URL。我们叫它们creator和consumer。我将假设这里的任何信号量都被称为closing_condition（例如使用Value），用于解析url并保存它们的方法分别被称为create_url_method和store_url。你知道吗

from multiprocessing import Queue, Value, Process
import queue


def creator(urls, closing_condition):
    """Parse page and put urls in given Queue."""
    while (not closing_condition):
        created_urls = create_url_method()
        [urls.put(url) for url in created_urls]


def consumer(urls, closing_condition):
    """Consume urls in given Queue."""
    while (not closing_condition):
        try:
            store_url(urls.get(timeout=1))
        except queue.Empty:
            pass


urls = Queue()
semaphore = Value('d', 0)

creators_number = 2
consumers_number = 2

creators = [
    Process(target=creator, args=(urls, semaphore))
    for i in range(creators_number)
]

consumers = [
    Process(target=consumer, args=(urls, semaphore))
    for i in range(consumers_number)
]

[p.start() for p in creators + consumer]
[p.join() for p in creators + consumer]

网友

2楼 · 编辑于 2024-09-26 18:07:20

谢谢你的回答！你知道吗

以下是成功的秘诀：

from multiprocessing import Pool

with Pool(10) as p:
    p.map(scrape_for_info, links))

网友

3楼 · 编辑于 2024-09-26 18:07:20

在中有一个很好的例子，它可以用Python自动化那些无聊的东西。你知道吗

https://automatetheboringstuff.com/chapter15/

基本上，您需要使用threading模块为每个url创建一个不同的线程，然后等待它们全部完成。你知道吗

import threading

def scrape_for_info(url):
    scrape = CP_GetOdds(url)

    for x in range(scrape.GameRange()):
        sql_str = "INSERT INTO Scraped_Odds ('"
        sql_str += str(scrape.Time()) + "', '"
        sql_str += str(scrape.Text(x)) + "', '"
        sql_str += str(scrape.HomeTeam()) + "', '"
        sql_str += str(scrape.Odds1(x)) + "', '"
        sql_str += str(scrape.Odds2(x)) + "', '"
        sql_str += str(scrape.AwayTeam()) + "')"

     cursor.execute(sql_str)
     conn.commit()

# Create and start the Thread objects.
threads = []
for link in links:
    thread = threading.Thread(target=scrape_for_info, args=(link))
    threads.append(thread)
    thread.start()

# Wait for all threads to end.
for thread in threads:
    thread.join()
print('Done.')

相关问题更多 >

编程相关推荐

热门问题

热门文章