Pandas dataframe，按最后一个位置的最后一列拆分数据，但保留其他列

Company Zip State City 1 *CBRE San Diego, CA 92101 4 1908 Brands Boulder, CO 80301 7 1st Infantry Division Headquarters Fort Riley, KS 10 21st Century Healthcare, Inc. Tempe 85282 15 AAA Jefferson City, MO 65101-9564

1条回答

网友

1楼 · 发布于 2024-10-02 16:25:22

您可以使用extract()方法：

In [110]: df
Out[110]:
                               Company                 Zip State City
1                                *CBRE            San Diego, CA 92101
4                          1908 Brands              Boulder, CO 80301
7   1st Infantry Division Headquarters                 Fort Riley, KS
10       21st Century Healthcare, Inc.                    Tempe 85282
15                                 AAA  Jefferson City, MO 65101-9564

In [112]: df[['City','State','ZIP']] = df['Zip State City'].str.extract(r'([^,\d]+)?[,]*\s*([A-Z]{2})?\s*([\d\-]{4,11})?', expand=True)

In [113]: df
Out[113]:
                               Company                 Zip State City            City State         ZIP
1                                *CBRE            San Diego, CA 92101       San Diego    CA       92101
4                          1908 Brands              Boulder, CO 80301         Boulder    CO       80301
7   1st Infantry Division Headquarters                 Fort Riley, KS      Fort Riley    KS         NaN
10       21st Century Healthcare, Inc.                    Tempe 85282          Tempe    NaN       85282
15                                 AAA  Jefferson City, MO 65101-9564  Jefferson City    MO  65101-9564

从docs：

^{pr2}$

For each subject string in the Series, extract groups from the first match of regular expression pat.
New in version 0.13.0.
Parameters:
pat : string
Regular expression pattern with capturing groups
flags : int, default 0 (no flags)
re module flags, e.g. re.IGNORECASE .. versionadded:: 0.18.0
expand : bool, default False
If True, return DataFrame.
If False, return Series/Index/DataFrame.
Returns: DataFrame with one row for each subject string, and one column for each group. Any capture group names in regular expression pat will be used for column names; otherwise capture group numbers will be used. The dtype of each result column is always object, even when no match is found. If expand=True and pat has only one capture group, then return a Series (if subject is a Series) or Index (if subject is an Index).

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pandas dataframe，按最后一个位置的最后一列拆分数据，但保留其他列

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >