Python parallelpipe包_程序模块 - PyPI

流水线并行库

parallelpipe的Python项目详细描述

ParallelPipe是一个用于Python的管道并行化库。

管道由一个或多个阶段组成。每一阶段取输出作为输入并对其执行某些操作像map、filter、reduce等，这是normal的扩展生产者/消费者模式，我们可以有多个阶段。每个阶段接收队列中的输入数据并将结果推送到另一个队列这与下一阶段有关。

在本例中，我们定义了一个stage函数，它将迭代器返回url并在下载：

fromparallelpipeimportstageimportrequests@stage(workers=4)deffetch_urls(urls):forurlinurls:result=requests.get(url)yieldresult.content

要使用此阶段，请运行

urls=['http://test.com',...]pipe=urls|fetch_urlsforcontentinpipe.results():print(len(content))

我们建造了一个只有一个阶段的基本管道。这个阶段有4名工人它将开始并行处理输入url。主要工序将在其中一个内容可用时立即接收下载的内容并打印相应的长度。注意，管道输入可以是任何iterable；这将自动包装到一个阶段中。

假设我们对html中的标题字符串感兴趣内容。我们可以添加另一个阶段来执行此操作：

importreRE_TITLE=re.compile("<title>(.*?)</title>",re.M)@stage(workers=2)defget_titles(contents):forcontentincontents:match=RE_TITLE.search(content)ifmatchisnotNone:yieldmatch.group(1)pipe=urls|fetch_urls|get_titlesfortitleinpipe.results():print(title)

同样，第二阶段将在内容完成后立即开始处理是可用的，并产生他的产出。注意，这个任务也是并行化，因为我们将workers设置为2。你可以看到这个阶段不是一张精确的地图，因为返回的标题数量可能少于文件编号（我们检查是否存在标题标签）。

现在我们再添加一个阶段以返回最常见的标题：

fromcollectionsimportCounter@stage()defmost_common(titles):commons=Counter(titles).most_common(1)yieldcommons[0]pipe=urls|fetch_urls|get_titles|most_commonprint(pipe.execute())

要计算最常见的标题，我们需要汇总所有结果，因此我们只能用一个工人。我们还使用pipe.execute()，而不是 pipe.results()，因为我们知道只会返回一个结果。

参数级

@stage(workers=4)defadd_n(input,n):fornumberininput:yieldnumber+npipe=range(100)|add_n(7)forresultinpipe.results():print(result)

在本例中，stage函数不仅需要输入迭代器但也有一个或多个额外的参数来执行他的计算。在在我们构建管道时，我们可以配置这个额外的参数只需把它们作为输入调用舞台。记住，所有参数可以传递，但第一个是必需输入的除外迭代器。

映射阶段

如果您的阶段执行纯映射，即它只返回一个结果使用^{tt3}可以简化代码的每个输入元素$ 装饰工：

fromparallelpipeimportmap_stage@map_stage(workers=4)defadd_n(number,n):returnnumber+n

队列大小

构建阶段时，可以定义其输出队列的大小。如果当前阶段可以比下一阶段的消费速度快得多。在这个 case，一旦达到队列大小，就停止处理他的输入等待消费者释放一个插槽。

# only 30 elements can queue in output before blocking this stage@stage(workers=4,qsize=30)defadd_n(input,n):fornumberininput:yieldnumber+n

默认情况下qsize=0，这意味着队列没有限制。

设置阶段

设置阶段队列，还可以在定义阶段调用setup()方法。

add_n.setup(workers=2,qsize=0)

直接使用stage类

到目前为止，我们在函数上使用decorator构建了阶段，但是我们也可以直接使用stage类：

fromparallelpipeimportStagedefadd_n(input,n):fornumberininput:yieldnumber+npipe=Stage(range,10)|Stage(add_n,5)

正如您在前面的示例中所看到的，stage类将迭代器函数及其所需的任何额外参数。第一阶段是生产者，因此不会使用任何输入迭代器调用。当我们使用 stage类明确地说，我们可以使用setup()来配置我们需要的工人和队列大小：

pipe=Stage(range,10).setup(qsize=5)|Stage(add_n,5).setup(workers=2)

setup()方法返回stage本身，因此我们可以设置它在管道定义期间。

异常处理

在执行stage函数期间，可能会发生异常。什么时候？阶段检测到它将自动使用和忽略的异常前一阶段的所有输入，然后a TaskException将在主要的过程中。

@stage(workers=2)defadd_one(numbers):fornumberinnumbers:yieldnumber+1

>>>pipe=[2,3,"ops",7]|add_one>>>print(sum(pipe.results()))Processadd_one-0:Traceback(mostrecentcalllast):File"/Users/gt/miniconda2/lib/python2.7/multiprocessing/process.py",line258,in_bootstrapself.run()File"/Users/gt/Desktop/code/parallelpipe/parallelpipe.py",line67,inrunforiteminres:File"example.py",line7,inadd_oneyieldnumber+1TypeError:cannotconcatenate'str'and'int'objectsTraceback(mostrecentcalllast):File"example.py",line10,in<module>print(sum(pipe.results()))File"/Users/gt/Desktop/code/parallelpipe/parallelpipe.py",line249,inresultsraiseTaskException(msg)parallelpipe.TaskException:Thetask"add_one-0"raisedTypeError("cannot concatenate 'str' and 'int' objects",)

<如果你想避免一个坏的输入阻塞你的管道当然可以捕获stage函数中的任何异常，以便管道可以继续并产生其余的结果。

欢迎加入QQ群-->： 979659372

parallelpipe 0.2.6

parallelpipe的Python项目详细描述

参数级

映射阶段

队列大小

设置阶段

直接使用stage类

异常处理

推荐PyPI第三方库

imonke

pyfixerio

MoneyCraft

Dtime

jmsg1990-distributions

safetyculture-sdk-python-beta

pyspdcalc

OuYangZhenQiang

all-stats-distributions

ocs-rise-set

probabilibutions

pyarmviz

xontrib-conda-project

mashengSuperMath

rex-gym

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

parallelpipe 0.2.6

parallelpipe的Python项目详细描述

参数级

映射阶段

队列大小

设置阶段

直接使用stage类

异常处理

推荐PyPI第三方库

imonke

pyfixerio

MoneyCraft

Dtime

jmsg1990-distributions

safetyculture-sdk-python-beta

pyspdcalc

OuYangZhenQiang

all-stats-distributions

ocs-rise-set

probabilibutions

pyarmviz

xontrib-conda-project

mashengSuperMath

rex-gym

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签