请原谅我的熊猫新手问题,但我有一列美国城镇和州,如下面所示的截断版本(出于某种奇怪的原因,该列的名称称为“Alabama[edit]”,它与列中的前0-7个城镇值相关联):
0 Auburn (Auburn University)[1]
1 Florence (University of North Alabama)
2 Jacksonville (Jacksonville State University)[2]
3 Livingston (University of West Alabama)[2]
4 Montevallo (University of Montevallo)[2]
5 Troy (Troy University)[2]
6 Tuscaloosa (University of Alabama, Stillman Co...
7 Tuskegee (Tuskegee University)[5]
8 Alaska[edit]
9 Fairbanks (University of Alaska Fairbanks)[2]
10 Arizona[edit]
11 Flagstaff (Northern Arizona University)[6]
12 Tempe (Arizona State University)
13 Tucson (University of Arizona)
14 Arkansas[edit]
15 Arkadelphia (Henderson State University, Ouach...
16 Conway (Central Baptist College, Hendrix Colle...
17 Fayetteville (University of Arkansas)[7]
18 Jonesboro (Arkansas State University)[8]
19 Magnolia (Southern Arkansas University)[2]
20 Monticello (University of Arkansas at Monticel...
21 Russellville (Arkansas Tech University)[2]
22 Searcy (Harding University)[5]
23 California[edit]
每个州的城镇位于每个州名称的下方,例如,费尔班克斯(列值9)是阿拉斯加州的城镇。你知道吗
我想做的是根据州名来划分城镇名称,这样我就有两列“state”和“RegionName”,其中每个州名与每个城镇名称相关联,如下所示:
RegionName State
0 Auburn (Auburn University)[1] Alabama
1 Florence (University of North Alabama) Alabama
2 Jacksonville (Jacksonville State University)[2] Alabama
3 Livingston (University of West Alabama)[2] Alabama
4 Montevallo (University of Montevallo)[2] Alabama
5 Troy (Troy University)[2] Alabama
6 Tuscaloosa (University of Alabama, Stillman Co... Alabama
7 Tuskegee (Tuskegee University)[5] Alabama
8 Fairbanks (University of Alaska Fairbanks)[2] Alaska
9 Flagstaff (Northern Arizona University)[6] Arizona
10 Tempe (Arizona State University) Arizona
11 Tucson (University of Arizona) Arizona
12 Arkadelphia (Henderson State University, Ouach... Arkansas
。等等
我知道每个州的名字后面都有一个字符串“[edit]”,我想我可以用它来分割和分配城镇的名字。但我不知道怎么做。你知道吗
另外,我知道我还需要做很多其他的数据清理工作,比如删除括号内和括号“[]”内的字符串。这可以稍后再做…重要的是将各州和城镇分开,并将每个城镇分配到适当的美国。任何建议都将不胜感激。你知道吗
如果没有太多的上下文或访问您的数据,我建议您按照这些思路做一些事情。首先,修改读取数据的代码:
现在,使用
str.extract
提取状态名,这应该只提取子字符串“[edit]”后面的名称。然后可以使用ffill
向前填充所有NaN值。你知道吗相关问题 更多 >
编程相关推荐