如何递归地将返回列表的芹菜任务链接到组中?

2024-10-02 20:32:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我从这个问题开始:How to chain a Celery task that returns a list into a group?

但我想扩大两倍。所以在我的用例中我有:

  • 任务A:确定给定日期的项目总数
  • 任务B:为该日期下载1000个元数据条目
  • 任务C:下载一个项目的内容

所以每一步我都在扩展下一步的项目数。我可以通过循环我的任务中的结果并在下一个任务函数中调用.delay()来完成。但我想我会尽量不让我的主要任务做到这一点。相反,它们将返回一个元组列表-然后每个元组将被扩展为参数,以便调用下一个函数。你知道吗

上面的问题有一个答案,似乎满足我的需要,但我无法找出正确的方法链接它的两个层次的扩展。你知道吗

下面是一个非常精简的代码示例:

from celery import group
from celery.task import subtask
from celery.utils.log import get_task_logger

from .celery import app

logger = get_task_logger(__name__)

@app.task
def task_range(upper=10):
    # wrap in list to make JSON serializer work
    return list(zip(range(upper), range(upper)))

@app.task
def add(x, y):
    logger.info(f'x is {x} and y is {y}')
    char = chr(ord('a') + x)
    char2 = chr(ord('a') + x*2)
    result = x + y
    logger.info(f'result is {result}')
    return list(zip(char * result, char2 * result))

@app.task
def combine_log(c1, c2):
    logger.info(f'combine log is {c1}{c2}')

@app.task
def dmap(args_iter, celery_task):
    """
    Takes an iterator of argument tuples and queues them up for celery to run with the function.
    """
    logger.info(f'in dmap, len iter: {len(args_iter)}')
    callback = subtask(celery_task)
    run_in_parallel = group(callback.clone(args) for args in args_iter)
    return run_in_parallel.delay()

然后,我尝试了各种方法来实现嵌套映射。首先,一级映射工作正常,因此:

pp = (task_range.s() | dmap.s(add.s()))
pp(2)

产生了我期望的结果,所以我不是完全不喜欢。你知道吗

但当我尝试添加另一个级别时:

ppp = (task_range.s() | dmap.s(add.s() | dmap.s(combine_log.s())))

然后在worker中我看到了错误:

[2019-11-23 22:34:12,024: ERROR/ForkPoolWorker-2] Task proj.tasks.dmap[e92877a9-85ce-4f16-88e3-d6889bc27867] raised unexpected: TypeError("add() missing 2 required positional arguments: 'x' and 'y'",)
Traceback (most recent call last):
  File "/home/hdowner/.venv/play_celery/lib/python3.6/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/hdowner/.venv/play_celery/lib/python3.6/site-packages/celery/app/trace.py", line 648, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/hdowner/dev/playground/celery/proj/tasks.py", line 44, in dmap
    return run_in_parallel.delay()
  File "/home/hdowner/.venv/play_celery/lib/python3.6/site-packages/celery/canvas.py", line 186, in delay
    return self.apply_async(partial_args, partial_kwargs)
  File "/home/hdowner/.venv/play_celery/lib/python3.6/site-packages/celery/canvas.py", line 1008, in apply_async
    args=args, kwargs=kwargs, **options))
  File "/home/hdowner/.venv/play_celery/lib/python3.6/site-packages/celery/canvas.py", line 1092, in _apply_tasks
    **options)
  File "/home/hdowner/.venv/play_celery/lib/python3.6/site-packages/celery/canvas.py", line 578, in apply_async
    dict(self.options, **options) if options else self.options))
  File "/home/hdowner/.venv/play_celery/lib/python3.6/site-packages/celery/canvas.py", line 607, in run
    first_task.apply_async(**options)
  File "/home/hdowner/.venv/play_celery/lib/python3.6/site-packages/celery/canvas.py", line 229, in apply_async
    return _apply(args, kwargs, **options)
  File "/home/hdowner/.venv/play_celery/lib/python3.6/site-packages/celery/app/task.py", line 532, in apply_async
    check_arguments(*(args or ()), **(kwargs or {}))
TypeError: add() missing 2 required positional arguments: 'x' and 'y'

我不知道为什么把参数从普通的任务签名改成链会改变参数传递到add()的方式。我的印象是不应该,它只是意味着add()的返回值会被传递。但显然不是这样。。。你知道吗


Tags: inpyapphometaskplayvenvlib
1条回答
网友
1楼 · 发布于 2024-10-02 20:32:48

问题是clone()实例上的chain方法在某个点上没有传递参数-有关详细信息,请参见https://stackoverflow.com/a/53442344/3189。如果使用该答案中的方法,我的dmap()代码将变为:

@app.task
def dmap(args_iter, celery_task):
    """
    Takes an iterator of argument tuples and queues them up for celery to run with the function.
    """
    callback = subtask(celery_task)
    run_in_parallel = group(clone_signature(callback, args) for args in args_iter)
    return run_in_parallel.delay()


def clone_signature(sig, args=(), kwargs=(), **opts):
    """
    Turns out that a chain clone() does not copy the arguments properly - this
    clone does.
    From: https://stackoverflow.com/a/53442344/3189
    """
    if sig.subtask_type and sig.subtask_type != "chain":
        raise NotImplementedError(
            "Cloning only supported for Tasks and chains, not {}".format(sig.subtask_type)
        )
    clone = sig.clone()
    if hasattr(clone, "tasks"):
        task_to_apply_args_to = clone.tasks[0]
    else:
        task_to_apply_args_to = clone
    args, kwargs, opts = task_to_apply_args_to._merge(args=args, kwargs=kwargs, options=opts)
    task_to_apply_args_to.update(args=args, kwargs=kwargs, options=deepcopy(opts))
    return clone

当我这么做的时候:

ppp = (task_range.s() | dmap.s(add.s() | dmap.s(combine_log.s())))

一切正常。你知道吗

相关问题 更多 >