在将数据帧拆分为两个组时,是否有多个变量影响因素?Pandas

2024-10-01 15:33:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在研究各州的新冠病毒-19死亡病例数量,并研究高州人口是否会导致感染新冠病毒-19的人死亡的可能性更高

目前正在将我的数据框分为两组,但按照我的设置方式,这一组将取决于两个因素,而不仅仅是一个-例如:高人口\高死亡率(这意味着州人口大于中值,死亡率大于中值),另一组将是高人口\低死亡率(州人口大于中位数,死亡率小于中位数)。当前代码如下,但我一直收到一个无效语法错误。所以我想知道是否不能基于两个变量将数据帧分成两组

将死亡病例数据集分为两组

highpop_highdeath = df.iloc[(df'StatePopulation' > 4342705.0), (df'deaths_to_cases' > 0.012143070253953211).values]
highpop_highdeath.name = 'States with a high population and high death rate'
highpop_lowdeath = df.iloc[(df'StatePopulation'> 4342705.0), (df'deaths_to_cases' <= 0.012143070253953211).values]
highpop_lowdeath.name = 'States with a high population and low death rate'

Tags: to数据df人口病毒high中位数死亡率
3条回答

要在过滤器上组合多个因子,需要对每个条件使用布尔运算符&

highpop_highdeath = df.loc[(df'StatePopulation' > 4342705.0) & (df'deaths_to_cases' > 0.012143070253953211), :]

是的,你可以有两个变量。顺便问一下,你能分享一下错误信息吗? 此外,请尝试以下方法:

highpop_highdeath = df.loc[(df['StatePopulation'] > 4342705.0) &  (df['deaths_to_cases'] > 0.012143070253953211)]
highpop_highdeath.name = 'States with a high population and high death rate'
highpop_lowdeath = df.loc[(df['StatePopulation']> 4342705.0) & (df['deaths_to_cases'] <= 0.012143070253953211)]
highpop_lowdeath.name = 'States with a high population and low death rate'

您希望合并这两个布尔向量。通过这种方式,对于数据帧中的每个位置,pandas将计算这两个语句,并且只有当这两个语句都为真时,才保留数据

highpop_highdeath = df.loc[(df'StatePopulation' > 4342705.0) & (df'deaths_to_cases' > 0.012143070253953211)]

ighpop_lowdeath = df.loc[(df'StatePopulation'> 4342705.0) & (df'deaths_to_cases' <= 0.012143070253953211)]

更简洁的是:

highpop_highdeath_names = df.loc[(df'StatePopulation' > 4342705.0) & (df'deaths_to_cases' > 0.012143070253953211),'name']

相关问题 更多 >

    热门问题