文本文件转换为字典(为什么我的循环不起作用)

2024-10-01 09:15:34 发布

您现在位置:Python中文网/ 问答频道 /正文

这是原始文本文件,我只想提取“reviewTexts”部分。 My Text File

我尝试将其转换为字典,然后提取键“reviewTexts”,下面是我在将文本文件更改为字典之前尝试清理文本文件的代码:

f = open('baby.txt','r')
lines=f.read().split("\n")[:30]
str_list = list(filter(None, lines))
str_list

结果是这样的:

 ['reviewerID:A1HK2FQW6KXQB2',
 'asin:097293751X',
 'reviewerName:Amanda Johnsen "Amanda E. Johnsen"',
 'helpful:[0, 0]',
 "reviewText:Perfect for new parents. We were able to keep track of baby's feeding, sleep and diaper change schedule for the first two and a half months of her life. Made life easier when the doctor would ask questions about habits because we had it all right there!",
 'overall:5.0',
 'summary:Awesine',
 'unixReviewTime:1373932800',
 'reviewTime:07 16, 2013',
 'reviewerID:A19K65VY14D13R',
 'asin:097293751X',
 'reviewerName:angela',
 'helpful:[0, 0]',
 'reviewText:This book is such a life saver.  It has been so helpful to be able to go back to track trends, answer pediatrician questions, or communicate with each other when you are up at different times of the night with a newborn.  I think it is one of those things that everyone should be required to have before they leave the hospital.  We went through all the pages of the newborn version, then moved to the infant version, and will finish up the second infant book (third total) right as our baby turns 1.  See other things that are must haves for baby at [...]',
 'overall:5.0',
 'summary:Should be required for all new parents!',
 'unixReviewTime:1372464000',
 'reviewTime:06 29, 2013',
 'reviewerID:A2LL1TGG90977E',
 'asin:097293751X',
 'reviewerName:Carter',
 'helpful:[0, 0]',
 "reviewText:Helps me know exactly how my babies day has gone with my mother in law watching him while I go to work.  It also has a section for her to write notes and let me know anything she may need.  I couldn't be happier with this book.",
 'overall:5.0',
 'summary:Grandmother watching baby',
 'unixReviewTime:1395187200',
 'reviewTime:03 19, 2014']

结果看起来不错,但是当我在循环中使用这段代码时,发生了一些奇怪的事情。结果不能显示全部行,但显示最后10行

dict_temp ={}
f = open('baby.txt','r')
lines=f.read().split("\n")[:30]
str_list = list(filter(None, lines))
str_list
for one in str_list: 
    k = one.split(':')[0]
    v = one.split(':')[1]
    dict_temp[k] = v

print(dict_temp)
{'reviewerID': 'A2LL1TGG90977E', 'asin': '097293751X', 'reviewerName': 'Carter', 'helpful': '[0, 0]', 'reviewText': "Helps me know exactly how my babies day has gone with my mother in law watching him while I go to work.  It also has a section for her to write notes and let me know anything she may need.  I couldn't be happier with this book.", 'overall': '5.0', 'summary': 'Grandmother watching baby', 'unixReviewTime': '1395187200', 'reviewTime': '03 19, 2014'}

请帮助我找出原因以及解决此问题的任何其他方法(仅摘录“reviewTexts”部分)


Tags: andofthetoforwithbelist
2条回答

您可以使用简单的list comprehension而不是创建dictionary

lst = ['reviewerID:A1HK2FQW6KXQB2',
 'asin:097293751X',
 'reviewerName:Amanda Johnsen "Amanda E. Johnsen"',
 'helpful:[0, 0]',
 "reviewText:Perfect for new parents. We were able to keep track of baby's feeding, sleep and diaper change schedule for the first two and a half months of her life. Made life easier when the doctor would ask questions about habits because we had it all right there!",
 'overall:5.0',
 'summary:Awesine',
 'unixReviewTime:1373932800',
 'reviewTime:07 16, 2013',
 'reviewerID:A19K65VY14D13R',
 'asin:097293751X',
 'reviewerName:angela',
 'helpful:[0, 0]',
 'reviewText:This book is such a life saver.  It has been so helpful to be able to go back to track trends, answer pediatrician questions, or communicate with each other when you are up at different times of the night with a newborn.  I think it is one of those things that everyone should be required to have before they leave the hospital.  We went through all the pages of the newborn version, then moved to the infant version, and will finish up the second infant book (third total) right as our baby turns 1.  See other things that are must haves for baby at [...]',
 'overall:5.0',
 'summary:Should be required for all new parents!',
 'unixReviewTime:1372464000',
 'reviewTime:06 29, 2013',
 'reviewerID:A2LL1TGG90977E',
 'asin:097293751X',
 'reviewerName:Carter',
 'helpful:[0, 0]',
 "reviewText:Helps me know exactly how my babies day has gone with my mother in law watching him while I go to work.  It also has a section for her to write notes and let me know anything she may need.  I couldn't be happier with this book.",
 'overall:5.0',
 'summary:Grandmother watching baby',
 'unixReviewTime:1395187200',
 'reviewTime:03 19, 2014']

[print(elem.split(':')[-1]) for elem in lst if 'reviewText:' in elem]

输出:

Perfect for new parents. We were able to keep track of baby's feeding, sleep and diaper change schedule for the first two and a half months of her life. Made life easier when the doctor would ask questions about habits because we had it all right there!
This book is such a life saver.  It has been so helpful to be able to go back to track trends, answer pediatrician questions, or communicate with each other when you are up at different times of the night with a newborn.  I think it is one of those things that everyone should be required to have before they leave the hospital.  We went through all the pages of the newborn version, then moved to the infant version, and will finish up the second infant book (third total) right as our baby turns 1.  See other things that are must haves for baby at [...]
Helps me know exactly how my babies day has gone with my mother in law watching him while I go to work.  It also has a section for her to write notes and let me know anything she may need.  I couldn't be happier with this book.

循环的问题在于,文本文件中有多个相同密钥的实例。当循环遍历列表并找到一个新键(比如“reviewText”)时,它会将与之关联的值存储在字典中。但是,当循环再次遇到字符串“reviewText”时(由于文件中有多个审阅,并且每个审阅包含所有键),该键的值将被覆盖。您将丢失密钥的旧值,新值将存储在密钥的位置

如果您只想为每次复习提取“复习课文”,您可以使用另一个答案中建议的列表理解。它假设字符串中只有一个“:”,而文本“reviewText”找不到任何字符串的值(将它们视为键值对)。以下是相同想法的更安全版本

reviews = [r[10:] for r in strlist if r[:10] == 'reviewText']

Rubber duck调试对于此类问题非常有效,因为一旦您通过循环的每次迭代跟踪每一行所做的事情,问题就显而易见了

相关问题 更多 >