有条件地追加多行数据帧

2024-09-30 18:18:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在迭代一个数据帧,每当第一个数据帧中的一行是某个值时,就尝试在另一个数据帧中追加新值

考虑下面两个数据框:

print(full_df)
                                             AccessName                                          PolicyArn
0                                   arn:aws:glue:sample  arn:aws:iam::971340810992:policy/service-role/...
1                                  arn:aws:glue:sample2  arn:aws:iam::971340810992:policy/service-role/...
2                                                   ---  arn:aws:iam::971340810992:policy/service-role/...
3                                  arn:aws:s3:::sample3  arn:aws:iam::971340810992:policy/service-role/...


print(side_df)
                                            AccessName
0                                          sample-test
1                                         query_sample
2                                     us-east-1-sample

如果full_df中的AccessName是某个值,则side_df被追加到full_df之后,对于所有行,第二行值始终为arn

arn = 'fixed_value'
for index, row in full_df.iterrows():
    if row['AccessName'] == '---':
        #Here I don't know how I'd define the code to append the side_df values:
        #full_df['AccessName'] = side_df['AccessName']
        #full_df['PolicyArn'] = arn

最好在该点上再做一个if语句,迭代side_df值并逐行追加

这个for函数嵌套在实际代码中,并且动态生成arn

期望输出:

                                             AccessName                                          PolicyArn
0                                   arn:aws:glue:sample  arn:aws:iam::971340810992:policy/service-role/...
1                                  arn:aws:glue:sample2  arn:aws:iam::971340810992:policy/service-role/...
2                                                   ---  arn:aws:iam::971340810992:policy/service-role/...
3                                           sample-test  fixed_value
4                                          query_sample  fixed_value
5                                      us-east-1-sample  fixed_value
6                                  arn:aws:s3:::sample3  arn:aws:iam::971340810992:policy/service-role/...

编写此代码的最佳方式是什么


Tags: 数据sampleawsdfvalueservicepolicyside
2条回答

您可以重新索引并附加数据帧,然后fillna()如下所示:

初始化:

df = pd.read_csv(io.StringIO(''' AccessName     PolicyArn
0                                   arn:aws:glue:sample  arn:aws:iam::971340810992:policy/service-role/...
1                                  arn:aws:glue:sample2  arn:aws:iam::971340810992:policy/service-role/...
2                                                    -  arn:aws:iam::971340810992:policy/service-role/...
3                                  arn:aws:s3:::sample3  arn:aws:iam::971340810992:policy/service-role/...
'''),sep='\s+')

df2 = pd.read_csv(io.StringIO('''   AccessName
0                                          sample-test
1                                         query_sample
2                                     us-east-1-sample'''),sep='\s+')

解决方案

id = df.index[df['AccessName'] == ' -'][0] +1
start, end = id + df2.shape[0],df2.shape[0] + df.shape[0]
df.index = np.append(df.index[:id],np.arange(start,end)) # index : [0, 1, 2, 6]
df2.index = np.arange(id,id +df2.shape[0]) # index [3, 4, 5]
solution_df = df.append(df2).sort_index().fillna('fixed_value')
solution_df
>>> AccessName              PolicyArn
0   arn:aws:glue:sample     arn:aws:iam::971340810992:policy/service-role/...
1   arn:aws:glue:sample2    arn:aws:iam::971340810992:policy/service-role/...
2    -                     arn:aws:iam::971340810992:policy/service-role/...
3   sample-test             fixed_value
4   query_sample            fixed_value
5   us-east-1-sample        fixed_value
6   arn:aws:s3:::sample3    arn:aws:iam::971340810992:policy/service-role/...

重要提示: 一般来说,对于大型数据集,不建议寻找涉及迭代的解决方案,尽量寻找向量化的解决方案,并避免像.iterrows().apply()这样的方法。祝你好运

首先,让我们构造您的side_df

side_df = pd.DataFrame([['sample-test'], ['query_sample'], ['us-east-1-sample']]
                       , columns=['AccessName'])
fixed_series = pd.Series(['fixed_value'] * len(side_df), name='PolicyArn').to_frame()
side_df_extended = pd.concat([side_df, fixed_series], axis=1)
print(side_df_extended)

       AccessName    PolicyArn
0       sample-test  fixed_value
1      query_sample  fixed_value
2  us-east-1-sample  fixed_value

假设full_df如下所示:

    AccessName              PolicyArn
0   arn:aws:glue:sample     arn:aws:iam::971340810992:policy/service-role/
1   arn:aws:glue:sample2    arn:aws:iam::971340810992:policy/service-role/
2   arn:aws:s3:::sample3    arn:aws:iam::971340810992:policy/service-role/
3   arn:aws:glue:sample4    arn:aws:iam::971340810992:policy/service-role/
4   arn:aws:s3:::sample5    arn:aws:iam::971340810992:policy/service-role/

现在,让我们获取具有您的条件的行的索引,例如:

indices = full_df['AccessName'] == 'arn:aws:glue:sample2'
rows = full_df[indices].index.tolist()
rows

[1]

现在,您希望在您的条件发生后附加side_df

final_df = pd.concat([full_df.iloc[:(rows[0] + 1)], side_df_extended, full_df.iloc[(rows[0] + 1):]], ignore_index=True)
final_df

    AccessName              PolicyArn
0   arn:aws:glue:sample     arn:aws:iam::971340810992:policy/service-role/
1   arn:aws:glue:sample2    arn:aws:iam::971340810992:policy/service-role/
2   sample-test             fixed_value
3   query_sample            fixed_value
4   us-east-1-sample        fixed_value
5   arn:aws:s3:::sample3    arn:aws:iam::971340810992:policy/service-role/
6   arn:aws:glue:sample4    arn:aws:iam::971340810992:policy/service-role/
7   arn:aws:s3:::sample5    arn:aws:iam::971340810992:policy/service-role/

相关问题 更多 >