擅长:python、mysql、java
<p>我将上面的输出写入了两个以制表符分隔的文件,并在下面进行了阅读,并添加了一列以指示它来自的dataframe或table:</p>
<pre><code>import pandas as pd
from scipy.stats import ttest_ind
t1 = pd.read_csv("../t1.csv",names=['V1','V2','V3'],sep="\t")
t1['data'] = 'data1'
t2 = pd.read_csv("../t2.csv",names=['V1','V2','V3'],sep="\t")
t2['data'] = 'data2'
V1 V2 V3 data
0 T1 X1 0.93 data1
1 T1 X2 0.30 data1
2 T1 X3 -2.90 data1
3 T2 X1 1.30 data1
</code></pre>
<p>然后我们将它们连接起来,并直接计算平均值:</p>
<pre><code>df = pd.concat([t1,t2])
res = df.groupby("V2").apply(lambda x:x['V3'].groupby(x['data']).mean())
data data1 data2
V2
X1 1.026 1.700
X2 0.180 -0.784
X3 0.340 0.836
</code></pre>
<p>p.value需要在应用程序中进行多一点编码:</p>
<pre><code>res['pvalue'] = df.groupby("V2").apply(lambda x:
ttest_ind(x[x['data']=="data1"]["V3"],x[x['data']=="data2"]["V3"])[1])
data data1 data2 pvalue
V2
X1 1.026 1.700 0.316575
X2 0.180 -0.784 0.521615
X3 0.340 0.836 0.657752
</code></pre>
<p>您始终可以选择执行<code>res.reset_index()</code>以获取表</p>