擅长:python、mysql、java
<p>可以简单地使用Python中的dplyr。</p>
<p>rpy2(随rpy2-2.7.0一起引入)中有一个到<code>dplyr</code>的接口,允许您编写如下内容:</p>
<pre><code>dataf = (DataFrame(mtcars).
filter('gear>3').
mutate(powertoweight='hp*36/wt').
group_by('gear').
summarize(mean_ptw='mean(powertoweight)'))
</code></pre>
<p>有一个<a href="http://rpy2.readthedocs.org/en/version_2.7.x/lib_dplyr.html" rel="noreferrer">example in the documentation</a>。医生的这部分也是一本笔记本。查找页面顶部附近的链接。</p>
<p>这个问题的另一个答案是比较R的dplyr和pandas(见@lgallen)。在rpy2与dplyr的接口中,相同的R-one-liner-chaining dplyr语句所写的基本相同。</p>
<p>R:</p>
<pre><code>flights %>%
group_by(year, month, day) %>%
select(arr_delay, dep_delay) %>%
summarise(
arr = mean(arr_delay, na.rm = TRUE),
dep = mean(dep_delay, na.rm = TRUE)
) %>%
filter(arr > 30 | dep > 30)
</code></pre>
<p>Python+rpy2:</p>
<pre><code>(DataFrame(flights).
group_by('year', 'month', 'day').
select('arr_delay', 'dep_delay').
summarize(arr = 'mean(arr_delay, na.rm=TRUE)',
dep = 'mean(dep_delay, na.rm=TRUE)').
filter('arr > 30 | dep > 30'))
</code></pre>