在列中查找特定字符串并查找与该字符串对应的最大值

2024-09-27 21:34:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我在想:

1.)如何在列中查找特定字符串
2.)给定该字符串,如何找到它对应的max
3.)如何计算该列中每行的字符串数

我有一个csv文件叫做体育.csv你知道吗

 import pandas as pd
 import numpy as np

#loading the data into data frame
X = pd.read_csv('sports.csv')

感兴趣的两列是TotalsGym列:

 Total  Gym
40  Football|Baseball|Hockey|Running|Basketball|Swimming|Cycling|Volleyball|Tennis|Ballet
37  Baseball|Tennis
61  Basketball|Baseball|Ballet
12  Swimming|Ballet|Cycling|Basketball|Volleyball|Hockey|Running|Tennis|Baseball|Football
78  Swimming|Basketball
29  Baseball|Tennis|Ballet|Cycling|Basketball|Football|Volleyball|Swimming
31  Tennis
54  Tennis|Football|Ballet|Cycling|Running|Swimming|Baseball|Basketball|Volleyball
33  Baseball|Hockey|Swimming|Cycling
17  Football|Hockey|Volleyball

请注意,Gym列中有多个对应运动的字符串,我试图找到一种方法来查找所有有棒球的健身房,并找到总数最大的一个。不过,我只对至少有两项其他运动的健身房感兴趣,即我不想考虑:

  Total   Gym
  37    Baseball|Tennis

Tags: csv字符串importasrunninggymbasketballfootball
3条回答

您可以在读取文件时一次性完成:

import csv
with open("sport.csv") as f:
    mx, best = float("-inf"), None
    for row in csv.reader(f, delimiter=" ", skipinitialspace=1):
        row[1:] = row[1].split("|")
        if "Baseball" in row and len(row[1:]) > 2 and int(row[0]) > mx:
            mx = int(row[0])
            best = row
    if best:
        print(best, mx, len(row[1:]))

这会给你:

(['61', 'Basketball', 'Baseball', 'Ballet'], 61, 3)

另一种不拆分的方法是计算管道字符:

import csv
with open("sports.csv") as f:
    mx, best = float("-inf"),None
    for row in csv.reader(f, delimiter=" ", skipinitialspace=1):
        print(row[1])
        if "Baseball" in row[1] and row[1].count("|") > 1 and int(row[0]) > mx:
            mx = int(row[0])
            best = row
    if best:
        print(best, mx, row[1].count("|"))

这意味着尽管一个子串可能与一个确切的词相匹配。你知道吗

您可以使用pandas轻松地做到这一点

首先,在制表符分隔符上将字符串拆分为一个列表,然后遍历该列表并选择长度大于2的字符串,因为您希望棒球和其他两项运动作为标准。你知道吗

In [4]: df['Gym'] = df['Gym'].str.split('|').apply(lambda x: ' '.join([i for i in x if len(x)>2]))

In [5]: df
Out[5]: 
   Total                                                Gym
0     40  Football Baseball Hockey Running Basketball Sw...
1     37                                                   
2     61                         Basketball Baseball Ballet
3     12  Swimming Ballet Cycling Basketball Volleyball ...
4     78                                                   
5     29  Baseball Tennis Ballet Cycling Basketball Foot...
6     31                                                   
7     54  Tennis Football Ballet Cycling Running Swimmin...
8     33                   Baseball Hockey Swimming Cycling
9     17                         Football Hockey Volleyball

使用str.containsGym列中搜索字符串Baseball。你知道吗

In [6]: df = df.loc[df['Gym'].str.contains('Baseball')]

In [7]: df
Out[7]: 
   Total                                                Gym
0     40  Football Baseball Hockey Running Basketball Sw...
2     61                         Basketball Baseball Ballet
3     12  Swimming Ballet Cycling Basketball Volleyball ...
5     29  Baseball Tennis Ballet Cycling Basketball Foot...
7     54  Tennis Football Ballet Cycling Running Swimmin...
8     33                   Baseball Hockey Swimming Cycling

计算各自的字符串计数。你知道吗

In [8]: df['Count'] = df['Gym'].str.split().apply(lambda x: len([i for i in x]))

然后选择与Totals列中的最大值对应的数据帧子集。你知道吗

In [9]: df.loc[df['Total'].idxmax()]
Out[9]: 
Total                            61
Gym      Basketball Baseball Ballet
Count                             3
Name: 2, dtype: object

试试这个:

df3.loc[(df3['Gym'].str.contains('Hockey') == True) & (df3["Gym"].str.count("\|")>1)].sort_values("Total").tail(1)

 Total                                                Gym
0     40  Football|Baseball|Hockey|Running|Basketball|Sw...


df3.loc[(df3['Gym'].str.contains('Baseball') == True) & (df3["Gym"].str.count("\|")>1)].sort_values("Total").tail(1)

   Total                         Gym
2     61  Basketball|Baseball|Ballet

相关问题 更多 >

    热门问题