在上将简单函数转换为多线程

from __future__ import print_function import os import json def url_searcher(value): url_file_path = "C:\\Users\\Link\\Desktop\\" for filename in os.listdir(url_file_path): if filename == "url_list.json": with open(url_file_path + filename) as f: for line in f: returnJson = json.loads(line) if value in returnJson["url"]: return returnJson print(url_searcher("http://zadkay.com/blog/wwp/51065983.jpg"))

{"dateadded": "2019-11-04 12:33:27", "url_status": "online", "tags": "elf", "url": "http://2.56.8.16/bins/arm7", "reporter": "Gandylyan1", "threat": "malware_download", "id": "251402"} {"dateadded": "2019-11-04 12:33:25", "url_status": "online", "tags": "elf", "url": "http://2.56.8.16/bins/arm6", "reporter": "Gandylyan1", "threat": "malware_download", "id": "251401"}

3条回答

网友

1楼 · 编辑于 2024-05-20 15:28:20

用python创建线程非常简单

import threading

thread_one= threading.Thread(name='searcher', target=url_searcher, args=(value))                
thread_one.start()

https://docs.python.org/3/library/threading.html

但我认为这不会减少处理时间，它只允许您一次读取多个文件，甚至可能需要更长的时间来处理每个文件。你知道吗

文件的结构总是一样的？您是否尝试过将其作为纯文本文件处理并搜索关键字名称？你知道吗

网友

2楼 · 编辑于 2024-05-20 15:28:20

首先，测量，而不是猜测。熟悉蛇： https://jiffyclub.github.io/snakeviz/

pip install snakeviz
python -m cProfile -o program.prof my_program.py
snakeviz program.prof

在我的机器上，65%的时间都花在json库的decode函数上。你知道吗

让我们尝试一个简单的改进：

with open(url_file_path + filename) as f:
    for line in f:
        # returnJson = json.loads(line)
        # if value in returnJson["url"]:
        #     return returnJson
        if value in line:
            returnJson = json.loads(line)
            return returnJson

在我的机器上，我看到执行时间提高了10倍。你知道吗

网友

3楼 · 编辑于 2024-05-20 15:28:20

看起来很像I/O受限操作：

from multiprocessing.dummy import Pool
from __future__ import print_function
from functools import partial
import json
import glob


def url_searcher(pathlist, value):

    for filename in pathlist:

         with open(filename) as f:

            for line in f:

                returnJson = json.loads(line)

                if value in returnJson["url"]:

                    return returnJson

no_of_threads = 4 #can be set manually or allocated automatically by Pool (see docs)

url_file_path = "C:\\Users\\Link\\Desktop\\"
value = "very_important_stuff" #adjust to your liking!

workers = Pool(no_of_threads)

pathlist = [f for f in glob.iglob(url_file_path, recursive=recurse) if "url_list.json" in f]

result = workers.map(partial(url_searcher, value = value), pathlist)

workers.close()       
workers.join()

相关问题更多 >

编程相关推荐

热门问题

热门文章