Pandas:删除有错误的行

2024-09-27 07:19:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据帧,df_melt

    MatchID GameWeek        Date              Team  Home        AgainstTeam
0     46605        1  2019-08-09         Liverpool  Home       Norwich City
1     46605        1  2019-08-09      Norwich City  Away          Liverpool
2     46606        1  2019-08-10   AFC Bournemouth  Home   Sheffield United
3     46606        1  2019-08-10  Sheffield United  Away    AFC Bournemouth
4     46607        1  2019-08-10           Burnley  Home        Southampton
..      ...      ...         ...               ...   ...                ...
540   46875       28         TBC               Aston Villa  Home   
541   46875       28         TBC          Sheffield United  Away   

显然存在一个问题,“TBC”值在几行中

如何删除这些有缺陷的行,或者以其他方式修复它们


Tags: 数据citydfhomeunitedawaymelttbc
2条回答

我假设“TBC”意味着游戏将在未来某个时间发生(“待确认”)。因此,如果要在分析中使用日期,我建议您使用“TBC”作为日期筛选行:

df_melt_no_tbc = df_melt[df_melt.Date != "TBC"]

你也可以用其他几种方法来做!见this post了解其他一些备选方案。下面是一个完整的输出示例:

>>> import pandas as pd
>>> 
>>> columns =["MatchID", "GameWeek", "Date", "Team", "Home", "AgainstTeam"]
>>> data = [["1", "1", "01-02-2020", "TeamA", "Here", "TeamB"],
...         ["1", "1", "TBC", "TeamB", "Here", "TeamA"]]
>>> df_melt = pd.DataFrame(data, columns=columns)
>>> print(df_melt)
  MatchID GameWeek        Date   Team  Home AgainstTeam
0       1        1  01-02-2020  TeamA  Here       TeamB
1       1        1         TBC  TeamB  Here       TeamA
>>> df_melt_no_tbc = df_melt[df_melt.Date != "TBC"]                                                                     
>>> print(df_melt_no_tbc)
  MatchID GameWeek        Date   Team  Home AgainstTeam
0       1        1  01-02-2020  TeamA  Here       TeamB

您可以使用dateutil来测试日期的有效性

from dateutil.parser import parse
def is_valid_date(s):
    try:
        parse(s)
        return True
    except:
        return False

df_melt = df_melt[df_melt.Date.apply(is_valid_date)]

相关问题 更多 >

    热门问题