大Pandas结合了城市和国家的字符串

2024-09-22 16:34:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个带有locations列的数据框,其中包含一系列城市和州。我想加入城市和国家的行列

0       Seattle, WA,Portland, OR,Everett, WA,Oklahoma ...
1       Silver Spring, MD,Portland, OR,Everett, WA,Den...
2       Oklahoma City, OK,Kingston, WA,Gardner, MA,Tul...
3       Portland, OR,Oklahoma City, OK,Eugene, OR,Corv...
4       Silver Spring, MD,Seattle, WA,Everett, WA,Spok...
3241    Seattle, WA,Silver Spring, MD,Portland, OR,Okl...

从研究中,我发现了一个建议,那就是把它们分开,重新组合在一起。但是,我无法使join/zip工作

test_df['locations'].str.split(',')

以下是我在作业中尝试的内容:

' '.join, zip(test_df['locations'][0::2], test_df['locations'][1::2])

期望输出:

0       ['Seattle, WA','Portland, OR', 'Everett, WA', 'Oklahoma City, OK']
1       ['Silver Spring, MD', 'Portland, OR', 'Everett, WA', 'Denver, CO']
...

Tags: ortestcitydfsilverokmdjoin
1条回答
网友
1楼 · 发布于 2024-09-22 16:34:15

设置:

df = pd.DataFrame({'locations': {0: 'Seattle, WA,Portland, OR,Everett, WA',
  1: 'Silver Spring, MD,Portland, OR,Everett, WA',
  2: 'Oklahoma City, OK,Kingston, WA,Gardner, MA',
  3: 'Portland, OR,Oklahoma City, OK,Eugene, OR',
  4: 'Silver Spring, MD,Seattle, WA,Everett, WA',
  3241: 'Seattle, WA,Silver Spring, MD,Portland, OR'}})

解决方案:

如果位置具有固定模式,即n对“城市、州”,则可按以下方式进行:

import numpy as np
(
    df.locations.str.split(',')
    .dropna()
    .apply(lambda x: x+[''] if len(x)%2 != 0 else x)
    .apply(lambda x: [','.join(e) for e in np.asarray(x).reshape(-1,2)])
    .tolist()
)

[['Seattle, WA', 'Portland, OR', 'Everett, WA'],
 ['Silver Spring, MD', 'Portland, OR', 'Everett, WA'],
 ['Oklahoma City, OK', 'Kingston, WA', 'Gardner, MA'],
 ['Portland, OR', 'Oklahoma City, OK', 'Eugene, OR'],
 ['Silver Spring, MD', 'Seattle, WA', 'Everett, WA'],
 ['Seattle, WA', 'Silver Spring, MD', 'Portland, OR']]

相关问题 更多 >