回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我有一个数据帧(df),希望获得关于列“国家”和“条件”的最大“NCT_ID”(不仅是唯一的值,而且是每次出现的值)。因此,对于“country”中的每个国家,我将在“CONDITION”中使用n(为简单起见,设置n=2)最常见的条件,按最大值排序。
df具有以下结构(所有列的值都不同,包括“国家”,这只是一小部分):</p>
<pre><code> NCT_ID CONDITION COUNTRY
0 NCT00000261 Substance-Related Disorders United States
1 NCT00000262 Opioid-Related Disorders United States
2 NCT00000263 Substance-Related Disorders United States
3 NCT00000263 Substance-Related Disorders United States
4 NCT00000264 Heart disease Canada
5 NCT00000264 Heart disease Canada
6 NCT00000267 Heart disease Canada
7 NCT00000264 Cancer Canada
8 NCT00000268 Cancer Canada
</code></pre>
<p>您可以按如下方式加载:</p>
<pre><code>import pandas as pd
df = pd.DataFrame([["NCT00000261", "Substance-Related Disorders", "United States"],
["NCT00000262", "Opioid-Related Disorders", "United States"],
["NCT00000263", "Substance-Related Disorders", "United States"],
["NCT00000263", "Substance-Related Disorders", "United States"],
["NCT00000264", "Heart disease", "Canada"],
["NCT00000264", "Heart disease", "Canada"],
["NCT00000267", "Heart disease", "Canada"],
["NCT00000264", "Cancer", "Canada"],
["NCT00000268", "Cancer", "Canada"]
],
columns=["NCT_ID", "CONDITION", "COUNTRY"]
)
</code></pre>
<p>因此,我希望最终结果如下所示:</p>
<pre><code> COUNTS CONDITION COUNTRY
0 3 Substance-Related Disorders United States
0 1 Opioid-Related Disorders United States
1 3 Heart disease Canada
1 2 Cancer Canada
</code></pre>
<p>最终df应显示n个最常见的条件,在n个国家中,总计数最大(条件总数)。
到目前为止我所做的:
在<a href="https://stackoverflow.com/a/17679517/7445528">https://stackoverflow.com/a/17679517/7445528</a>之后,
我尝试过:</p>
<pre><code># df_combined = df_combined.groupby(['COUNTRY', 'CONDITION']).size()
# df_combined = df_combined.groupby(['COUNTRY', 'CONDITION']).size().groupby(level=0).max()
# df_combined = df_combined.groupby(['COUNTRY', 'CONDITION']).size().reset_index().groupby('COUNTRY')[[0]].max()
</code></pre>
<p>但这并没有得到正确的数据帧结果。
要查看到目前为止的整个项目,请执行以下操作:
<a href="https://github.com/Gustav-Rasmussen/AACT-Analysis/tree/master" rel="nofollow noreferrer">https://github.com/Gustav-Rasmussen/AACT-Analysis/tree/master</a></p>