从字符串识别模式并更新数据帧

2024-10-03 04:38:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个带有特定模式的列表,我想基于该格式创建和更新一个数据帧。 名单如下:

text =  ['chocolate1','a;b;','c;d','icecream','e;f;','g;h', 'i;j', 'cookie', 'k;l', 'm;n']

如果仔细观察,则模式为:

我想提取每个巧克力名称并将其添加到巧克力编号列。 最终的数据帧如下所示:

|chocolate#|chocolateName|
|chocolate1|a|
|chocolate1|b|
|chocolate1|c|
|chocolate1|d|
|icecream|e|
|icecream|f|
|icecream|g|
|icecream|h|
|icecream|i|
|icecream|j|
|cookie|k|
|cookie|l|
|cookie|m|
|cookie|n|

根据我掌握的数据,我尝试了一些事情。似乎什么都没用

new_text = []
for line in text.splitlines():
    if len(line.split())==0 or len(line.split())==1:
      continue
    else:
      new_text.append(line)
for i in new_text[13:]:
  if ';' not in i:
    title_index = new_text.index(i)
    print(title_index)
    break

Tags: 数据textinnewforindexlenif
1条回答
网友
1楼 · 发布于 2024-10-03 04:38:14

试试这个:

import pandas as pd

# Create a pandas dataframe from list
text =  ['chocolate1','a;b;','c;d','icecream','e;f;','g;h', 'i;j', 'cookie', 'k;l', 'm;n']
s = pd.Series(text)
df = s.to_frame(name='letters')

# Create new column food where strings do not have ;
df['food'] = df.loc[~df['letters'].str.contains(';'), 'letters']
df['food'] = df['food'].ffill()

# remove rows that doesn't have ';' for letters
df = df[df['letters'].str.contains(';')].copy()

# Explode letters into rows of dataframe
df['letters'] = df['letters'].str.split(';')
df_out = df.explode('letters')

# Eliminate rows with blank for letters
df_out = df_out[df_out['letters'] != '']

print(df_out)

输出:

  letters        food
1       a  chocolate1
1       b  chocolate1
2       c  chocolate1
2       d  chocolate1
4       e    icecream
4       f    icecream
5       g    icecream
5       h    icecream
6       i    icecream
6       j    icecream
8       k      cookie
8       l      cookie
9       m      cookie
9       n      cookie

相关问题 更多 >