我有一个我搜集到的字符串列表,我想把这些字符串分组,然后将其重新格式化为列数据。但是,变量标题并不是每个组都有。你知道吗
我的列表名为complist
,如下所示:
[u'Intake Received Date:',
u'9/11/2012',
u'Intake ID:',
u'CA00325127',
u'Allegation Category:',
u'Infection Control',
u'Investigation Finding:',
u'Substantiated',
u'Intake Received Date:',
u'5/14/2012',
u'Intake ID:',
u'CA00310421',
u'Allegation Category:',
u'Quality of Care/Treatment',
u'Investigation Finding:',
u'Substantiated',
u'Intake Received Date:',
u'8/15/2011',
u'Intake ID:',
u'CA00279396',
u'Allegation Category:',
u'Quality of Care/Treatment',
u'Sub Categories:',
u'Screening',
u'Investigation Finding:',
u'Unsubstantiated',]
我的目标是让它看起来像这样:
'Intake Received Date', 'Intake ID', 'Allegation Category', 'Sub Categories', 'Investigation Finding'
'9/11/2012', 'CA00325127', 'Infection Control', '', 'Substantiated'
'5/14/2012', 'CA00310421', 'Quality of Care/Treatment', '', 'Substantiated'
'8/15/2011', 'CA00279396', 'Quality of Care/Treatment', 'Screening', 'Unsubstantiated'
我做的第一件事是根据起始元素Intake Received Date
将列表分解成块
compgroup = []
for k, g in groupby(complist, key=lambda x:re.search(r'Intake Received Date', x)):
if not k:
compgroup.append(list(g))
#Intake Received Date was removed, so insert it back to beginning of each list:
for c in compgroup:
c.insert(0, u'Intake Received Date')
#Create list of dicts to map the preceding titles to their respective data element:
dic = []
for c in compgroup:
dic.append(dict(zip(*[iter(c)]*2)))
下一步是将dict列表转换为列式数据,但此时我觉得我的方法过于复杂,我肯定缺少一些更优雅的东西。我很感激你的指导。你知道吗
给出:
你的方法其实很好。我编辑了一下。您不需要正则表达式,也不需要重新插入
Intake Received Date
尝试:
印刷品:
相关问题 更多 >
编程相关推荐