正在添加没有索引号的新行

2024-06-28 19:37:10 发布

您现在位置:Python中文网/ 问答频道 /正文

这就是我试图用我的代码实现的:我有一个包含网球运动员姓名的当前csv文件,我想在排名中显示新球员后将其添加到该文件中。我的脚本遍历排名并创建一个数组,然后从csv文件导入名称。它应该查看哪些名称不在后者中,然后提取这些名称的在线数据。然后,我只想将新行追加到旧CSV文件的末尾。我的问题是新行是用玩家的名字索引的,而不是跟随旧文件的索引。知道为什么会这样吗?还有,为什么要添加未命名的列


def get_all_players():

    # imports names of players currently in the atp rankings
    current_atp_ranking = check_atp_rankings()
    current_player_list = current_atp_ranking['Player']

    # clean up names in case of white spaces
    for i in range(0, len(current_player_list)):
        current_player_list[i] = current_player_list[i].strip()

    # reads the main file and makes a dataframe out of it
    current_file = 'ATP_stats_new.csv'
    df = pd.read_csv(current_file)

    # gets all the names within the main file to see which current ones aren't there
    names_on_file = list(df['Player'])
    # cleans up in case of any white spaces
    for i in range(0, len(names_on_file)):
        names_on_file[i] = names_on_file[i].strip()

    # Removing Nadal for testing purposes
    names_on_file.remove("Rafael Nadal")

    # creating a list of players in current_players_list but not in names_on_file
    new_player_list = [x for x in current_player_list if x not in names_on_file]

    # loop through new_player_list
    for player in new_player_list:

        # delay to avoid stopping
        time.sleep(2)

        # finding the player's atp link for profile based on their name
        atp_link = current_atp_ranking.loc[current_atp_ranking['Player'] == player, 'ATP_Link']
        atp_link = atp_link.iloc[0]

        # make a basic dictionary with just the player's name and link
        player_dict = [{'Name': player, 'ATP_Link': atp_link}]

        # enter the new dictionary into the existing main file
        df.append(player_dict, ignore_index=True)

    # print dataframe to see how it looks before exporting
    print(df)

    # export dataframe into current file
    df.to_csv(current_file)

这是文件最初的样子:

      Unnamed: 0            Player  ...                         Coach Turned_Pro
0              0    Novak Djokovic  ...                           NaN        NaN
1              1      Rafael Nadal  ...   Carlos Moya, Francisco Roig     2001.0
2              2     Roger Federer  ...  Ivan Ljubicic, Severin Luthi     1998.0
3              3   Daniil Medvedev  ...                           NaN        NaN
4              4     Dominic Thiem  ...                           NaN        NaN
...          ...               ...  ...                           ...        ...
1976        1976      Brian Bencic  ...                           NaN        NaN
1977        1977  Boruch Skierkier  ...                           NaN        NaN
1978        1978      Majed Kilani  ...                           NaN        NaN
1979        1979   Quentin Gueydan  ...                           NaN        NaN
1980        1980     Preston Brown  ...                           NaN        NaN

这就是新行的外观:

1977              1977.0  ...        NaN
1978              1978.0  ...        NaN
1979              1979.0  ...        NaN
1980              1980.0  ...        NaN
Rafael Nadal         NaN  ...       2001

Tags: 文件ofcsvtheinfornameson
1条回答
网友
1楼 · 发布于 2024-06-28 19:37:10

您的代码中有一些关键部分缺失,这些部分是准确回答问题所必需的。根据您发布的内容,有两个想法:

导入您的CSV文件

您以前的csv文件可能与索引一起保存。确保上次在第一个csv列中使用时,csv文件内容没有数据帧索引。保存时,请执行以下操作:

file.to_csv('file.csv', index=False)

当您像这样加载文件时

pandas.read_csv('file.csv')

它将自动分配索引号,并且不会有重复的列

列顺序错误

不确定atp_link以什么顺序接收什么信息。从您提供的内容来看,它似乎返回了两列:“Coach”和“Turning Pro”

我建议您在从atp_link中提取信息后,为每个要添加的新玩家创建一个列表(而不是dict)。因此,如果您正在添加纳达尔,您将根据信息为每个新玩家创建一个信息列表。纳达尔的信息列表如下所示:

info_list = ['Rafael Nadal', '','2001']

然后将列表附加到数据帧,如下所示:

df.loc[len(df),:] = info_list

希望这有帮助

相关问题 更多 >