通过在另一列中拆分逗号分隔的多个值来复制行

original_df = DataFrame([{'country': 'a', 'title': 'title1'}, {'country': 'a,b,c', 'title': 'title2'}, {'country': 'd,e,f', 'title': 'title3'}, {'country': 'e', 'title': 'title4'}]) desired_df = DataFrame([{'country': 'a', 'title': 'title1'}, {'country': 'a', 'title': 'title2'}, {'country': 'b', 'title': 'title2'}, {'country': 'c', 'title': 'title2'}, {'country': 'd', 'title': 'title3'}, {'country': 'e', 'title': 'title3'}, {'country': 'f', 'title': 'title3'}, {'country': 'e', 'title': 'title4'}]) #Code I used: desired_df = pd.concat( [ Series(row["title"], row["country"].split(",")) for _, row in original_df.iterrows() ] ).reset_index()

2条回答

网友

1楼 · 编辑于 2024-10-06 10:31:15

您可以在此处将^{}与^{}一起使用

df['country'] = df['country'].str.split(',')
df.explode('country').reset_index(drop=True)

  country   title
0       a  title1
1       a  title2
2       b  title2
3       c  title2
4       d  title3
5       e  title3
6       f  title3
7       e  title4

对于NameError，可以通过这种方式使用导入

from pandas import DataFrame, Series

注意：使用上述导入语句只会将DataFrame和Series类带入范围

网友

2楼 · 编辑于 2024-10-06 10:31:15

首先split在逗号上单击列以获取列表，然后可以explode该系列列表。将'title'移动到索引，以便对'country'中的每个元素重复它。最后两部分只是清理名称并从索引中删除标题

(df.set_index('title')['country']
   .str.split(',')
   .explode()
   .rename('country')
   .reset_index())

    title country
0  title1       a
1  title2       a
2  title2       b
3  title2       c
4  title3       d
5  title3       e
6  title3       f
7  title4       e

另外，您的原始代码在逻辑上很好，但您需要正确地创建对象。我建议导入模块而不是单个的类/方法，这样您就可以创建一个Series带有pd.Series而不是Series

import pandas as pd
                
desired_df = pd.concat([pd.Series(row['title'], row['country'].split(','))              
                        for _, row in original_df.iterrows()]).reset_index()

相关问题更多 >

编程相关推荐

热门问题

热门文章