<p>使用逐位<code>or</code>运算符很难实现这一点,因为<code>pandas.DataFrame</code>实现了它。如果您不介意用<code>>></code>替换<code>|</code>,您可以尝试:</p>
<pre><code>import pandas as pd
def select(df, *args):
cols = [x for x in args]
return df[cols]
def rename(df, **kwargs):
for name, value in kwargs.items():
df = df.rename(columns={'%s' % name: '%s' % value})
return df
class SinkInto(object):
def __init__(self, function, *args, **kwargs):
self.args = args
self.kwargs = kwargs
self.function = function
def __rrshift__(self, other):
return self.function(other, *self.args, **self.kwargs)
def __repr__(self):
return "<SinkInto {} args={} kwargs={}>".format(
self.function,
self.args,
self.kwargs
)
df = pd.DataFrame({'one' : [1., 2., 3., 4., 4.],
'two' : [4., 3., 2., 1., 3.]})
</code></pre>
<p>然后你可以:</p>
<pre><code>>>> df
one two
0 1 4
1 2 3
2 3 2
3 4 1
4 4 3
>>> df = df >> SinkInto(select, 'one') \
>> SinkInto(rename, one='new_one')
>>> df
new_one
0 1
1 2
2 3
3 4
4 4
</code></pre>
<p>在Python 3中,您可以滥用unicode:</p>
<pre><code>>>> print('\u01c1')
ǁ
>>> ǁ = SinkInto
>>> df >> ǁ(select, 'one') >> ǁ(rename, one='new_one')
new_one
0 1
1 2
2 3
3 4
4 4
</code></pre>
<p>[更新]</p>
<blockquote>
<p>Thanks for your response. Would it be possible to make a separate class (like SinkInto) for each function to avoid having to pass the functions as an argument?</p>
</blockquote>
<h2>装修师怎么样?</h2>
<pre><code>def pipe(original):
class PipeInto(object):
data = {'function': original}
def __init__(self, *args, **kwargs):
self.data['args'] = args
self.data['kwargs'] = kwargs
def __rrshift__(self, other):
return self.data['function'](
other,
*self.data['args'],
**self.data['kwargs']
)
return PipeInto
@pipe
def select(df, *args):
cols = [x for x in args]
return df[cols]
@pipe
def rename(df, **kwargs):
for name, value in kwargs.items():
df = df.rename(columns={'%s' % name: '%s' % value})
return df
</code></pre>
<p>现在您可以修饰任何将<code>DataFrame</code>作为第一个参数的函数:</p>
<pre><code>>>> df >> select('one') >> rename(one='first')
first
0 1
1 2
2 3
3 4
4 4
</code></pre>
<h2>Python真棒!</h2>
<p>我知道像Ruby这样的语言“非常有表现力”,它鼓励人们把每一个程序都写成新的DSL,但这在Python中有点不受欢迎。许多pythonist认为为了不同的目的而重载操作符是一种罪恶的亵渎。</p>
<h2>[更新]</h2>
<p>用户OHLÁLÁ不以为然:</p>
<blockquote>
<p>The problem with this solution is when you are trying to call the function instead of piping. – OHLÁLÁ</p>
</blockquote>
<p>您可以实现dunder调用方法:</p>
<pre><code>def __call__(self, df):
return df >> self
</code></pre>
<p>然后:</p>
<pre><code>>>> select('one')(df)
one
0 1.0
1 2.0
2 3.0
3 4.0
4 4.0
</code></pre>
<p>看来要取悦OHLÁLÁ很不容易:</p>
<blockquote>
<p>In that case you need to call the object explicitly:<br/>
<code>select('one')(df)</code> Is there a way to avoid that? – OHLÁLÁ</p>
</blockquote>
<p>好吧,我可以想出一个解决方案,但有一个警告:您的原始函数不能采用第二个位置参数,即pandas数据帧(关键字参数是可以的)。让我们在docorator中的<code>__new__</code>类中添加一个<code>PipeInto</code>方法,该类测试第一个参数是否是数据帧,如果是,则使用参数调用原始函数:</p>
<pre><code>def __new__(cls, *args, **kwargs):
if args and isinstance(args[0], pd.DataFrame):
return cls.data['function'](*args, **kwargs)
return super().__new__(cls)
</code></pre>
<p>这似乎是可行的,但可能有一些缺点我无法发现。</p>
<pre><code>>>> select(df, 'one')
one
0 1.0
1 2.0
2 3.0
3 4.0
4 4.0
>>> df >> select('one')
one
0 1.0
1 2.0
2 3.0
3 4.0
4 4.0
</code></pre>