我有一个DataFrame列,其中有一个要解析的长字符串。我对regex是新手,还没有使用过它。下面我只返回了我的名字。。充其量。我想知道正则表达式解析这个字符串或创建字典进行迭代是否更容易。这是我目前的情况。顺序并不总是相同的(C、W、D、G、UTIL),我将编写一个for循环来迭代多行,就像下面这样
import pandas as pd
import numpy as np
import re
df = pd.DataFrame(data=np.array([['C Mark Scheifele C Pierre-Luc Dubois UTIL Zach Parise W Mats Zuccarello W Oliver Bjorkstrand W Nick Foligno D Ryan Suter D Seth Jones G Devan Dubnyk'],['UTIL Kyle Connor C Pierre-Luc Dubois C Boone Jenner W Mats Zuccarello W Oliver Bjorkstrand W Nick Foligno D Ryan Suter D Seth Jones G Devan Dubnyk']]), columns=['Lineup'])
df['C1'] = re.findall(r" C \w+",str(df['Lineup']))
df['C2'] = re.findall(r'C \w+',str(df['Lineup']))
df['W1'] = re.findall(r'W \w+',str(df['Lineup']))
df['W2'] = re.findall(r'W \w+',str(df['Lineup']))
df['W3'] = re.findall(r'W \w+',str(df['Lineup']))
df['D1'] = re.findall(r'D \w+',str(df['Lineup']))
df['D1'] = re.findall(r'D \w+',str(df['Lineup']))
df['G']= re.findall(r'G \w+',str(df['Lineup']))
df['UTIL'] = re.findall(r'UTIL \w+',str(df['Lineup']))
我正在寻找将这些值存储到DF中
df['C1'] = Mark Scheifele
df['C2'] = Pierre-Luc Dubois
df['W1'] = Mats Zuccarello
df['W2'] = Oliver Bjorkstrand
df['W3'] = Nick Foligno
df['D1'] = Ryan Suter
df['D2'] = Seth Jones
df['G']= Devan Dubnyk
df['UTIL'] = Zach Parise
结果数据帧
df_result = pd.DataFrame(data=np.array([['Mark Scheifele','Pierre-Luc Dubois','Mats Zuccarello','Oliver Bjorkstrand','Nick Foligno','Ryan Suter','Seth Jones','Devan Dubnyk','Zach Parise'],['Boone Jenner','Pierre-Luc Dubois','Mats Zuccarello','Oliver Bjorkstrand','Nick Foligno','Ryan Suter','Seth Jones','Devan Dubnyk','Kyle Connor']]), columns=['C1','C2','W1','W2','W3','D1','D2','G','UTIL'])
此版本将使您能够拥有随机顺序、长度(不同的
ids
计数)以及更多。但是,它依赖于完全大写的单词是id
的指示符如果希望将返回的数据帧附加在一起,请尝试此操作,希望返回的数据与目标数据相同
最后的数据帧请参见下图。这适用于任意数量的行:
相关问题 更多 >
编程相关推荐