从拆分字符串列创建新列

2024-05-20 18:46:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我想做的是创建新的列来捕获字符串中的每个单词。例如:

df

Col1        Col2  
 38       'My Name is John'
 11       'Hello friend'
 134      'My favorite city is New Orleans'

所需df:

Col1    Col2    Col3    Col4    Col5    Col6    Col7
38      'My'   'Name'    'is'   'John'   NA      NA
11     'Hello' 'friend'   NA     NA      NA      NA
134     'My'  'favorite' 'city' 'is'    'New' 'Orleans'

有人对此有什么想法吗?谢谢


Tags: 字符串namefriendcityhellodfnewis
3条回答

您可以将Col2转换为包含单词列表的序列,然后将该序列转换回数据帧

import pandas as pd

# OPs data
x = ['My Name is John', 'Hello friend', 'My favorite city is New Orleans']
df = pd.DataFrame(x, columns=['Col2'])

# convert to series with lists
s = df['Col2'].apply(lambda x: x.split())

# lists to dataframe
df_new = pd.DataFrame(item for item in s)

方法str.split将在单词列表的列中拆分字符串。然后,您可以填充列表,使其具有相同的长度,并由此创建新的数据帧:

words = df.text.str.split()
maxlen = words.map(len).max()

def pad_list(l):
    return l + [None] * (maxlen - len(l))

words = pd.DataFrame(np.stack(words.map(pad_list), axis=0))

您可以使用以下方法创建它:

import pandas as pd

df = pd.DataFrame({'Col1': [38, 11, 134], 
                    'Col2':['My Name is John', 'Hello friend', 'My favorite city is New Orleans']})
 

df1 = df.Col2.str.split(expand=True) 

df1.columns = ['Col1', 'Col2', 'Col3', 'Col4', 'Col5', 'Col6']

相关问题 更多 >