从数据库中的字符串列中提取数字

2024-09-26 18:08:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下一个带有字符串列(“Info”)的数据帧:

df = pd.DataFrame( {'Date': ["2014/02/02", "2014/02/03"], 'Info': ["Out of 78 shares traded during the session today, there were 54 increases, 9 without change and 15 decreases.", "Out of 76 shares traded during the session today, there were 60 increases, 4 without change and 12 decreases."]})

我需要提取的数字从“信息”到新的4列在同一个df

第一行的值为[78,54,9,15]

我一直在努力

df[["new1","new2","new3","new4"]]= df.Info.str.extract('(\d+(?:\.\d+)?)', expand=True).astype(int)

但我认为这更复杂

问候,


Tags: oftheinfodftodaysessionoutchange
2条回答

Extractall可能更适合此任务

df[["new1","new2","new3","new4"]] = df['Info'].str.extractall(r'(\d+)')[0].unstack()
         Date                                               Info new1 new2 new3 new4
0  2014/02/02  Out of 78 shares traded during the session tod...   78   54    9   15
1  2014/02/03  Out of 76 shares traded during the session tod...   76   60    4   12

我明白了,你在试图避免捕捉数字的小数部分,对吗?(第(?:\.\d+)?部分。)

首先,如果需要所有匹配项,则需要使用^{}extract在第一次之后停止

使用df,尝试以下代码:

# Get a multiindexed dataframe using extractall
expanded = df.Info.str.extractall(r"(\d+(?:\.\d+)?)")

# Pivot the index labels
df_2 = expanded.unstack()

# Drop the multiindex
df_2.columns = df_2.columns.droplevel()


# Add the columns to the original dataframe (inplace or make a new df)
df_combined = pd.concat([df, df_2], axis=1)

Output df

相关问题 更多 >

    热门问题