我想知道如何找到基于几个不同类别的估计值。其中两列是分类的,另一列包含两个感兴趣的字符串,最后一列包含数值 我有一个csv文件叫做体育.csv你知道吗
import pandas as pd
import numpy as np
#loading the data into data frame
df = pd.read_csv('sports.csv')
我试图找到一个建议的price
对于一个Gym
既有棒球和篮球,也有enrollment
从240到260,因为它们是从region
4到type
1
Region Type enroll estimates price Gym
2 1 377 0.43 40 Football|Baseball|Hockey|Running|Basketball|Swimming|Cycling|Volleyball|Tennis|Ballet
4 2 100 0.26 37 Baseball|Tennis
4 1 347 0.65 61 Basketball|Baseball|Ballet
4 1 264 0.17 12 Swimming|Ballet|Cycling|Basketball|Volleyball|Hockey|Running|Tennis|Baseball|Football
1 1 286 0.74 78 Swimming|Basketball
0 1 210 0.13 29 Baseball|Tennis|Ballet|Cycling|Basketball|Football|Volleyball|Swimming
0 1 263 0.91 31 Tennis
2 2 271 0.39 54 Tennis|Football|Ballet|Cycling|Running|Swimming|Baseball|Basketball|Volleyball
3 3 247 0.51 33 Baseball|Hockey|Swimming|Cycling
0 1 109 0.12 17 Football|Hockey|Volleyball
我不知道怎么把所有的东西拼凑起来。很抱歉,如果语法不正确,我只是刚刚开始使用Python。到目前为止,我已经:
import pandas as pd
import numpy as np
#loading the data into data frame
df = pd.read_csv('sports.csv')
#group 4th region and type 1 together where enrollment is in between 240 and 260
group = df[df['Region'] == 4] df[df['Type'] == 1] df[240>=df['Enrollment'] <=260 ]
#split by pipe chars to find gyms that contain both Baseball and Basketball
df['Gym'] = df['Gym'].str.split('|')
df['Gym'] = df['Gym'].str.contains('Baseball'& 'Basketball')
price = df.loc[df['Gym'], 'Price']
我应该改做群比吗?如果是这样,我将如何包含列Type
==1Region
==4和从240到260的注册?你知道吗
我必须添加一个实际符合您的条件的实例,否则您将得到一个空结果。您希望将
df.loc
与以下条件一起使用:注意,我对contains使用了regex模式,它实际上充当regex的AND操作符。你可以简单地为篮球和棒球做另一个
.contains
条件的结合。你知道吗您可以使用指定的所有条件创建
mask
,然后使用掩码进行子集设置:它返回空,因为没有满足上述所有条件的记录。你知道吗
相关问题 更多 >
编程相关推荐