基于特定类别查找值

import pandas as pd import numpy as np #loading the data into data frame df = pd.read_csv('sports.csv') #group 4th region and type 1 together where enrollment is in between 240 and 260 group = df[df['Region'] == 4] df[df['Type'] == 1] df[240>=df['Enrollment'] <=260 ] #split by pipe chars to find gyms that contain both Baseball and Basketball df['Gym'] = df['Gym'].str.split('|') df['Gym'] = df['Gym'].str.contains('Baseball'& 'Basketball') price = df.loc[df['Gym'], 'Price']

2条回答

网友

1楼 · 编辑于 2024-09-27 21:26:32

我必须添加一个实际符合您的条件的实例，否则您将得到一个空结果。您希望将df.loc与以下条件一起使用：

In [1]: import pandas as pd, numpy as np, io
In [2]: in_string = io.StringIO("""Region  Type    enroll  estimates   price   Gym
    ...: 2   1   377 0.43    40  Football|Baseball|Hockey|Running|Basketball|Swimming|Cycling|Volleyball|Tennis|Ballet
    ...: 4   2   100 0.26    37  Baseball|Tennis
    ...: 4   1   247 0.65    61  Basketball|Baseball|Ballet
    ...: 4   1   264 0.17    12  Swimming|Ballet|Cycling|Basketball|Volleyball|Hockey|Running|Tennis|Baseball|Football
    ...: 1   1   286 0.74    78  Swimming|Basketball
    ...: 0   1   210 0.13    29  Baseball|Tennis|Ballet|Cycling|Basketball|Football|Volleyball|Swimming
    ...: 0   1   263 0.91    31  Tennis
    ...: 2   2   271 0.39    54  Tennis|Football|Ballet|Cycling|Running|Swimming|Baseball|Basketball|Volleyball
    ...: 3   3   247 0.51    33  Baseball|Hockey|Swimming|Cycling
    ...: 0   1   109 0.12    17  Football|Hockey|Volleyball""")

In [3]: df = pd.read_csv(in_string,delimiter=r"\s+")

In [4]: df.loc[df.Gym.str.contains(r"(?=.*Baseball)(?=.*Basketball)") 
    ...:        & (df.enroll <= 260) & (df.enroll >= 240) 
    ...:        & (df.Region == 4) & (df.Type == 1), 'price']
Out[4]: 
2    61
Name: price, dtype: int64

注意，我对contains使用了regex模式，它实际上充当regex的AND操作符。你可以简单地为篮球和棒球做另一个.contains条件的结合。你知道吗

网友

2楼 · 编辑于 2024-09-27 21:26:32

您可以使用指定的所有条件创建mask，然后使用掩码进行子集设置：

mask = (df['Region'] == 4) & (df['Type'] == 1) & \
       (df['enroll'] <= 260) & (df['enroll'] >= 240) & \
        df['Gym'].str.contains('Baseball') & df['Gym'].str.contains('Basketball')

df['price'][mask]
# Series([], name: price, dtype: int64)

它返回空，因为没有满足上述所有条件的记录。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章