Python收款.计数器把JSON中的东西排除在外问题的回答

Python收款.计数器把JSON中的东西排除在外

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

问题是您在整个文件中使用<code>findall</code>，请执行以下操作： <pre><code>import re import collections import json def words(s): return re.findall('\w+', s, re.UNICODE | re.IGNORECASE) file = open('message.json', encoding="utf8") data = json.load(file) counts = collections.Counter((w.lower() for e in data for w in words(e.get('content', '')))) most_common = counts.most_common(50) print(most_common) </code></pre> 输出 <pre><code>[('siä', 1), ('ci', 1), ('podobajä', 1)] </code></pre> 输出用于具有以下内容的文件（JSON对象列表）： <pre><code>[{ "sender_name": "xxxxxx", "timestamp_ms": 1540327935616, "content": "Podobaj\u00c4\u0085 ci si\u00c4\u0099", "type": "Generic" }] </code></pre> 解释 使用<code>json.load</code>将文件的内容作为字典列表<code>data</code>加载，然后迭代字典的元素，并使用函数<code>words</code>和<code>Counter</code>计算<code>'content'</code>字段的单词数 进一步 <ol> <li>要删除I、a和but等词，请参见<a href="https://stackoverflow.com/questions/5486337/how-to-remove-stop-words-using-nltk-or-python">this</a></li> </ol> 更新 给定文件的格式，您需要将行：<code>data = json.load(file)</code>更改为<code>data = json.load(file)["messages"]</code>，用于以下内容： <pre><code>{ "participants":[], "messages": [ { "sender_name": "xxxxxx", "timestamp_ms": 1540327935616, "content": "Podobaj\u00c4\u0085 ci si\u00c4\u0099", "type": "Generic" }, { "sender_name": "aaa", "timestamp_ms": 1540329382942, "content": "aaa", "type": "Generic" }, { "sender_name": "aaa", "timestamp_ms": 1540329262248, "content": "aaa", "type": "Generic" } ] } </code></pre> 输出为： <pre><code>[('aaa', 2), ('siä', 1), ('podobajä', 1), ('ci', 1)] </code></pre>

Python收款.计数器把JSON中的东西排除在外

1 个回答

相关Python问题