将所选项目从列中分离,并使其成为不同的列

2024-10-06 09:55:08 发布

您现在位置:Python中文网/ 问答频道 /正文

目前,这些数据在一列中包含状态和区域,我想将它们分开 我想在各州和地区的不同行中隐藏和清理这些数据 这是我要转换和清理的数据

    Alabama[edit]
0   Auburn (Auburn University)[1]
1   Florence (University of North Alabama)
2   Jacksonville (Jacksonville State University)[2]
3   Livingston (University of West Alabama)[2]
4   Montevallo (University of Montevallo)[2]
5   Troy (Troy University)[2]
6   Tuscaloosa (University of Alabama, Stillman Co...
7   Tuskegee (Tuskegee University)[5]
8   Alaska[edit]
9   Fairbanks (University of Alaska Fairbanks)[2]
10  Arizona[edit]
11  Flagstaff (Northern Arizona University)[6]
12  Tempe (Arizona State University)
13  Tucson (University of Arizona)
14  Arkansas[edit]
15  Arkadelphia (Henderson State University, Ouach...
16  Conway (Central Baptist College, Hendrix Colle...
17  Fayetteville (University of Arkansas)[7]
18  Jonesboro (Arkansas State University)[8]
19  Magnolia (Southern Arkansas University)[2]

这就是我想要数据的方式

    State   RegionName
0   Alabama     Auburn
1   Alabama     Florence
2   Alabama     Jacksonville
3   Alabama     Livingston
4   Alabama     Montevallo
5   Alabama     Troy
6   Alabama     Tuscaloosa
7   Alabama     Tuskegee
8   Alaska  Fairbanks
9   Arizona     Flagstaff
10  Arizona     Tempe
11  Arizona     Tucson
12  Arkansas    Arkadelphia
13  Arkansas    Conway
14  Arkansas    Fayetteville
15  Arkansas    Jonesboro
16  Arkansas    Magnolia
17  Arkansas    Monticello
18  Arkansas    Russellville
19  Arkansas    Searcy

Tags: of数据editstateuniversitytroyalabamaauburn
1条回答
网友
1楼 · 发布于 2024-10-06 09:55:08
series = pd.Series(['Alabama[edit]',
'Auburn (Auburn University)[1]',
'Florence (University of North Alabama)',
'Jacksonville (Jacksonville State University)[2]',
'Livingston (University of West Alabama)[2]',
'Montevallo (University of Montevallo)[2]',
'Troy (Troy University)[2]',
'Tuscaloosa (University of Alabama, Stillman Co...)',
'Tuskegee (Tuskegee University)[5]',
'Alaska[edit]',
'Fairbanks (University of Alaska Fairbanks)[2]',
'Arizona[edit]',
'Flagstaff (Northern Arizona University)[6]',
'Tempe (Arizona State University)',
'Tucson (University of Arizona)',
'Arkansas[edit]',
'Arkadelphia (Henderson State University, Ouach...)',
'Conway (Central Baptist College, Hendrix Colle...)',
'Fayetteville (University of Arkansas)[7]',
'Jonesboro (Arkansas State University)[8]',
'Magnolia (Southern Arkansas University)[2]'])

cleaned = series.apply(lambda x: str(x).split('(')[0])
id_state = cleaned.str.contains(r'edit').cumsum()-1
map_state = dict(enumerate(cleaned.loc[cleaned.str.contains(r'edit')].apply(lambda x: str(x).split('[')[0])))
id_state = [map_state[i] for i in id_state]

df = pd.DataFrame({'State':id_state, 'RegionName':cleaned})
df = df.loc[~cleaned.str.contains(r'edit')]
df

结果:

enter image description here

相关问题 更多 >