我有两个类似的csv文件,如下所示:
{http://www.omg.org/XMI}id,begin,end,Character
45440,34,45,Miss Parker
45455,137,147,Farrington
48976,295,298,Mr Alleyne
45533,890,900,Mr Alleyne
49020,2147,2154,Mr Alleyne
49020,2147,2154,Mr Alleyne
48606,2689,2696,Farrington
46858,3690,3693,Farrington
48680,5280,5291,clients
46880,5373,5376,Farrington
46728,5396,5407,clients
49057,5673,5683,clients
48734,6145,6155,Mr Alleyne
48734,6145,6155,Mr Alleyne
46699,6661,6664,Miss Delacour
49094,6969,6972,Farrington
48841,8451,8461,Mr Alleyne
48849,8466,8479,Miss Delacour
我希望能够创建一个唯一字符作为键的字典,并添加它们的偏移量'begin'
和'end'
,忽略列'{http://www.omg.org/XMI}id'
到各自的唯一字符aka key,每次在这两个文件中被提及
我期望的输出应该如下所示:
print(dict_of_mentions)
输出:
{'Farrington': [(137,147),(2689,2696) #etc...],
'Mr Alleyne': [(295,298), (890,900) #etc...], #rest of characters... }
到目前为止,我的代码如下所示:
import tkinter
from tkinter import filedialog
def character_mentions():
filenames = filedialog.askopenfilenames()
for filename in filenames:
reader = csv.DictReader(open(filename))
dict_of_mentions = {}
for row in reader:
key = row.pop('Character')
if key in dict_of_mentions:
#implement duplicate row handling here
pass
dict_of_mentions[key] = row
print(dict_of_mentions)
输出如下所示:
{'Miss Parker': OrderedDict([('{http://www.omg.org/XMI}id', '45440'), ('begin', '34'), ('end', '45')]) 'Farrington': OrderedDict([('{http://www.omg.org/XMI}id', '46645'), ('begin', '22012'), ('end', '22014')]), 'Mr Alleyne': OrderedDict([('{http://www.omg.org/XMI}id', '47297'), ('begin', '13952'), ('end', '13962')]), 'clients': OrderedDict([('{http://www.omg.org/XMI}id', '49057'), ('begin', '5673'), ('end', '5683')]), 'Miss Delacour': OrderedDict([('{http://www.omg.org/XMI}id', '45867'), ('begin', '9101'), ('end', '9109')]), 'Everyone': OrderedDict([('{http://www.omg.org/XMI}id', '45836'), ('begin', '11896'), ('end', '11900')]), "Terry Kelly's clerk": OrderedDict([('{http://www.omg.org/XMI}id', '49278'), ('begin', '11971'), ('end', '11980')]), 'crowd': OrderedDict([('{http://www.omg.org/XMI}id', '49337'), ('begin', '12458'), ('end', '12471')]), 'office-girls': OrderedDict([('{http://www.omg.org/XMI}id', '49359'), ('begin', '12537'), ('end', '12549')]), 'Higgins': OrderedDict([('{http://www.omg.org/XMI}id', '45936'), ('begin', '13925'), ('end', '13927')]), 'friends': OrderedDict([('{http://www.omg.org/XMI}id', '49592'), ('begin', '17499'), ('end', '17506')]), 'boys': OrderedDict([('{http://www.omg.org/XMI}id', '47949'), ('begin', '17638'), ('end', '17649')]), 'one of the young women': OrderedDict([('{http://www.omg.org/XMI}id', '46257'), ('begin', '19945'), ('end', '19954')]), 'Weathers': OrderedDict([('{http://www.omg.org/XMI}id', '49643'), ('begin', '19881'), ('end', '19891')]), 'curate': OrderedDict([('{http://www.omg.org/XMI}id', '46142'), ('begin', '19094'), ('end', '19101')]), 'Ada': OrderedDict([('{http://www.omg.org/XMI}id', '46364'), ('begin', '20313'), ('end', '20316')]), 'Tom': OrderedDict([('{http://www.omg.org/XMI}id', '49804'), ('begin', '21852'), ('end', '21855')])}
感谢您的帮助
您可以使用
itertools.groupby
轻松地做到这一点这只会使一个键,如果一个与该名称不存在
相关问题 更多 >
编程相关推荐