使用iterrows更改行下的所有值

2024-10-06 08:50:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在为df工作,它代表美国地区,也包含国家。各州旁边有[编辑]。两个州之间的所有地区都属于上述州。我认为这应该行得通,但它不会因为某些原因改变df的值。。。你知道这里发生了什么吗?你会怎么做?在

这是测向仪

0                      Alabama[edit]
1                            Auburn 
2                          Florence 
3                      Jacksonville 
4                        Livingston 
5                        Montevallo 
6                              Troy 
7                        Tuscaloosa 
8                          Tuskegee 
9                       Alaska[edit]
10                        Fairbanks 
11                     Arizona[edit]
12                        Flagstaff 
13                            Tempe 
14                           Tucson 
15                    Arkansas[edit]
16                      Arkadelphia 
17                           Conway 
18                     Fayetteville 
19                        Jonesboro 
20                         Magnolia 
21                       Monticello 
22                     Russellville 
23                           Searcy 
24                  California[edit]
25                           Angwin 
26                           Arcata 
27                         Berkeley 
28                            Chico 
29                        Claremont 

我的解决方案是不改变测向:

^{pr2}$

Tags: 编辑df原因代表国家edit地区troy
3条回答

请尝试以下代码

import pandas as pd
import numpy as np
df['State']=df['RegionName']
df.loc[~df['RegionName'].str.contains('[edit]'),'State']=np.nan
df['State']=df['State'].str.replace('[edit]','').fillna(method='ffill')
print(df)

假设您的列名是regions,您可以使用str.extract

df.assign(
    state=df.region.str.extract(r'(.*?)\[edit\]').ffill()
).mask(df.region.str.endswith('[edit]')).dropna()

          region       state
1         Auburn     Alabama
2       Florence     Alabama
3   Jacksonville     Alabama
4     Livingston     Alabama
5     Montevallo     Alabama
6           Troy     Alabama
7     Tuscaloosa     Alabama
8       Tuskegee     Alabama
10     Fairbanks      Alaska
12     Flagstaff     Arizona
13         Tempe     Arizona
14        Tucson     Arizona
16   Arkadelphia    Arkansas
17        Conway    Arkansas
18  Fayetteville    Arkansas
19     Jonesboro    Arkansas
20      Magnolia    Arkansas
21    Monticello    Arkansas
22  Russellville    Arkansas
23        Searcy    Arkansas
25        Angwin  California
26        Arcata  California
27      Berkeley  California
28         Chico  California
29     Claremont  California

如果您想在region列中保留状态,只需删除mask

^{pr2}$

如果我没听错,这里有一个避免显式循环的解决方案。在

# Create a new column of state names with NaN in any
# row that did not contain a state name flagged with "edit"
df['state'] = df[df['RegionName'].str.contains('edit')]['RegionName']

# Forward-fill the NaNs in the state column
df = df.ffill()

# Delete rows where RegionName == state and
# reset index to default integers
df = df[df.iloc[:, 0] != df.iloc[:, 1]].reset_index(drop=True)

# Delete "[edit]" flag from strings
df['state'] = df['state'].str.replace('\[edit\]', '')

# Result:
df
      RegionName       state
0         Auburn     Alabama
1       Florence     Alabama
2   Jacksonville     Alabama
3     Livingston     Alabama
4     Montevallo     Alabama
5           Troy     Alabama
6     Tuscaloosa     Alabama
7       Tuskegee     Alabama
8      Fairbanks      Alaska
9      Flagstaff     Arizona
10         Tempe     Arizona
11        Tucson     Arizona
12   Arkadelphia    Arkansas
13        Conway    Arkansas
14  Fayetteville    Arkansas
15     Jonesboro    Arkansas
16      Magnolia    Arkansas
17    Monticello    Arkansas
18  Russellville    Arkansas
19        Searcy    Arkansas
20        Angwin  California
21        Arcata  California
22      Berkeley  California
23         Chico  California
24     Claremont  California

相关问题 更多 >