我目前正在尝试从数据框的行中获取国家。以下是我目前拥有的代码:
l = [
['[Aydemir, Deniz\', \' Gunduz, Gokhan\', \' Asik, Nejla] Bartin
Univ, Fac Forestry, Dept Forest Ind Engn, TR-74100 Bartin,
Turkey\', \' [Wang, Alice] Lulea Univ Technol, Wood Technol,
Skelleftea, Sweden',1990],
['[Fang, Qun\', \' Cui, Hui-Wang] Zhejiang A&F Univ, Sch Engn, Linan
311300, Peoples R China\', \' [Du, Guan-Ben] Southwest Forestry
Univ, Kunming 650224, Yunnan, Peoples R China',2005],
['[Blumentritt, Melanie\', \' Gardner, Douglas J.\', \' Shaler
Stephen M.] Univ Maine, Sch Resources, Orono, ME USA\', \' [Cole,
Barbara J. W.] Univ Maine, Dept Chem, Orono, ME 04469 USA',2012],
['[Kyvelou, Pinelopi; Gardner, Leroy; Nethercot, David A.] Univ
London Imperial Coll Sci Technol & Med, London SW7 2AZ,
England',1998]]
dataf = pd.DataFrame(l, columns = ['Authors', 'Year'])
这是数据帧。代码如下:
df = (dataf['Authors']
.replace(r"\bUSA\b", "United States", regex=True)
.apply(lambda x: geotext.GeoText(x).countries))
问题是GeoText不承认“美国”,但现在我也看到我需要把“英格兰”、“苏格兰”、“威尔士”和“北爱尔兰”改成“英国”。
如何扩展.replace
来实现这一点?你知道吗
您可以使用
Series.str
模块的translate
方法并传递替换字典。你知道吗这对我有用。代码如下:
相关问题 更多 >
编程相关推荐