<p>我认为您必须这样做才能从文件中提取所有记录,并获得审阅/摘要值。您不需要数据帧</p>
<pre><code>#create a dictionary to store the list of review summary values
d = {'review summary':[]}
#function to extract only the review_summary from the line
def split_review_summary(full_line):
#find review/text and exclude it from the line
found = full_line.find('review/text:')
if found >= 0:
full_line = full_line[:found]
#find review summary. All text to the right is review summary
#add this to the dictionary
found = full_line.find('review/summary:')
if found >= 0:
review_summary = full_line[(found + 15):]
d['review summary'].append(review_summary)
#open the file for reading
with open ("xyz.txt","r") as f:
#read the first line
new_line = f.readline().rstrip('\n')
#loop through the rest of the lines
for line in f:
#remove newline from the data
line = line.rstrip('\n')
#if the line starts with product/productId, then its a new entry
#process the previous line and strip out the review_summary
#to do that, call split_review_summary function
if line[:17] == 'product/productId':
split_review_summary(new_line)
#reset new_line to the current line
new_line = line
else:
#append to the new_line as its part of the previous record
new_line += line
#the last full record has not been processed
#So send it to split_review_summary to extract review summary
split_review_summary(new_line)
#now dictionary d has all the review summary items
print (d)
</code></pre>
<p>其输出将为:</p>
<pre><code>{'review summary': [' Good Quality Dog Food ', ' Not as Advertised ']}
</code></pre>
<p>我认为你的问题范围还包括写一个新文件</p>
<p>您可以打开一个文件并将字典作为一行编写。这将包含所有细节。我将把这部分留给你来解决</p>