使用Python中缀语法从一个函数到另一个函数的“管道”输出

3条回答

网友

1楼 · 编辑于 2024-05-09 18:26:28

使用逐位or运算符很难实现这一点，因为pandas.DataFrame实现了它。如果您不介意用>>替换|，您可以尝试：

import pandas as pd

def select(df, *args):
    cols = [x for x in args]
    return df[cols]


def rename(df, **kwargs):
    for name, value in kwargs.items():
        df = df.rename(columns={'%s' % name: '%s' % value})
    return df


class SinkInto(object):
    def __init__(self, function, *args, **kwargs):
        self.args = args
        self.kwargs = kwargs
        self.function = function

    def __rrshift__(self, other):
        return self.function(other, *self.args, **self.kwargs)

    def __repr__(self):
        return "<SinkInto {} args={} kwargs={}>".format(
            self.function, 
            self.args, 
            self.kwargs
        )

df = pd.DataFrame({'one' : [1., 2., 3., 4., 4.],
                   'two' : [4., 3., 2., 1., 3.]})

然后你可以：

>>> df
   one  two
0    1    4
1    2    3
2    3    2
3    4    1
4    4    3

>>> df = df >> SinkInto(select, 'one') \
            >> SinkInto(rename, one='new_one')
>>> df
   new_one
0        1
1        2
2        3
3        4
4        4

在Python 3中，您可以滥用unicode：

>>> print('\u01c1')
ǁ
>>> ǁ = SinkInto
>>> df >> ǁ(select, 'one') >> ǁ(rename, one='new_one')
   new_one
0        1
1        2
2        3
3        4
4        4

[更新]

Thanks for your response. Would it be possible to make a separate class (like SinkInto) for each function to avoid having to pass the functions as an argument?

装修师怎么样？

def pipe(original):
    class PipeInto(object):
        data = {'function': original}

        def __init__(self, *args, **kwargs):
            self.data['args'] = args
            self.data['kwargs'] = kwargs

        def __rrshift__(self, other):
            return self.data['function'](
                other, 
                *self.data['args'], 
                **self.data['kwargs']
            )

    return PipeInto


@pipe
def select(df, *args):
    cols = [x for x in args]
    return df[cols]


@pipe
def rename(df, **kwargs):
    for name, value in kwargs.items():
        df = df.rename(columns={'%s' % name: '%s' % value})
    return df

现在您可以修饰任何将DataFrame作为第一个参数的函数：

>>> df >> select('one') >> rename(one='first')
   first
0      1
1      2
2      3
3      4
4      4

Python真棒！

我知道像Ruby这样的语言“非常有表现力”，它鼓励人们把每一个程序都写成新的DSL，但这在Python中有点不受欢迎。许多pythonist认为为了不同的目的而重载操作符是一种罪恶的亵渎。

[更新]

用户OHLÁLÁ不以为然：

The problem with this solution is when you are trying to call the function instead of piping. – OHLÁLÁ

您可以实现dunder调用方法：

def __call__(self, df):
    return df >> self

然后：

>>> select('one')(df)
   one
0  1.0
1  2.0
2  3.0
3  4.0
4  4.0

看来要取悦OHLÁLÁ很不容易：

In that case you need to call the object explicitly:
select('one')(df) Is there a way to avoid that? – OHLÁLÁ

好吧，我可以想出一个解决方案，但有一个警告：您的原始函数不能采用第二个位置参数，即pandas数据帧（关键字参数是可以的）。让我们在docorator中的__new__类中添加一个PipeInto方法，该类测试第一个参数是否是数据帧，如果是，则使用参数调用原始函数：

def __new__(cls, *args, **kwargs):
    if args and isinstance(args[0], pd.DataFrame):
        return cls.data['function'](*args, **kwargs)
    return super().__new__(cls)

这似乎是可行的，但可能有一些缺点我无法发现。

>>> select(df, 'one')
   one
0  1.0
1  2.0
2  3.0
3  4.0
4  4.0

>>> df >> select('one')
   one
0  1.0
1  2.0
2  3.0
3  4.0
4  4.0

网友

2楼 · 编辑于 2024-05-09 18:26:28

您可以使用sspipe库，并使用以下语法：

from sspipe import p
df = df | p(select, 'one') \
        | p(rename, one = 'new_one')

网友

3楼 · 编辑于 2024-05-09 18:26:28

虽然我不得不提到在Python中使用dplyr in Python可能是在dplyr中最接近的事情（它有rshift操作符，但作为一个噱头），但我也要指出，pipe操作符可能只在R中是必需的，因为它使用泛型函数而不是方法作为对象属性。Method chaining为您提供了基本相同的功能，而无需重写运算符：

dataf = (DataFrame(mtcars).
         filter('gear>=3').
         mutate(powertoweight='hp*36/wt').
         group_by('gear').
         summarize(mean_ptw='mean(powertoweight)'))

请注意，在一对圆括号之间包装链可以将其拆分为多行，而无需在每行上拖尾\。

装修师怎么样？

Python真棒！

[更新]

相关问题更多 >

编程相关推荐

热门问题

热门文章