<h2>第一种解决方案</h2>
<p>下面是我如何处理这个问题的:</p>
<pre><code>import json
import collections
if __name__ == '__main__':
# Load file into data
with open('raw.json') as f:
data = [json.loads(line) for line in f]
# Calculate count and total
time_total = collections.defaultdict(float)
time_count = collections.defaultdict(int)
for row in data:
time_count[row['name']] += 1
time_total[row['name']] += row['time']
# Calculate average
time_average = {}
for name in time_count:
time_average[name] = time_total[name] / time_count[name]
# Report
for name in sorted(time_count):
print '{:<10} {:2} {:8.2f} {:8.2f}'.format(
name,
time_count[name],
time_total[name],
time_average[name])
</code></pre>
<h2>讨论</h2>
<ul>
<li><code>data</code>是一个<code>dict</code>列表,其中包含<em>name</em>,<em>time</em>。。。</li>
<li>我使用了另外三个字典来记录每台机器的计数、总数和平均值。</li>
<li>我想你需要根据时间值来计算。如果不是,那就很容易解决。</li>
<li><code>defaultdict</code>是一种很好的计数方法。如果尚未创建int值,则将创建该值并将其赋值为0,非常方便。你应该查一下。</li>
</ul>
<hr/>
<h2>第二种解决方案</h2>
<p>这里有一种不同的方法:既然您的数据看起来像一个表,为什么不使用数据库来处理您的数据。这种方法的优点是你不必自己计算。</p>
<pre><code>import json
import sqlite3
if __name__ == '__main__':
# Create an in-memory database for calculation
connection = sqlite3.connect(':memory:')
cursor = connection.cursor()
cursor.execute('DROP TABLE IF EXISTS time_table')
cursor.execute('CREATE TABLE time_table (name text, time real)')
connection.commit()
# Load file into database
with open('raw.json') as f:
for line in f:
row = json.loads(line)
cursor.execute('INSERT INTO time_table VALUES (?,?)', (row['name'], row['time']))
connection.commit()
# Report: print the name, count, sum, and average
cursor.execute('SELECT name, COUNT(time), SUM(time), AVG(time) FROM time_table GROUP BY name')
print '%-10s %8s %8s %8s' % ('NAME', 'COUNT', 'SUM', 'AVERAGE')
for row in cursor.fetchall():
print '%-10s %8d %8.2f %8.2f' % row
connection.close()
</code></pre>
<h2>输出</h2>
<pre><code>NAME COUNT SUM AVERAGE
machine1 1 12.64 12.64
machine2 1 12.64 12.64
machine3 4 50.77 12.69
machine4 3 38.03 12.68
machine5 5 63.45 12.69
</code></pre>
<h2>讨论</h2>
<ul>
<li>在这个解决方案中,我创建了一个内存中的SQLite3数据库</li>
<li>因为我们只对<em>name</em>和<em>time</em>列感兴趣,所以表只包含这两个列。</li>
<li>我们只需使用数据库就可以免费获得所有的统计函数,如<code>SUM</code>、<code>COUNT</code>和<code>AVG</code>。</li>
</ul>
<hr/>
<h2>添加到第一个解决方案</h2>
<p>要回答这个问题:给定<em>machine5</em>,如何获取最后一个值?这样,我假设您希望将数据筛选到包含<em>machine5</em>的数据,然后按时间排序并选择最后一行。对于第一个解决方案,附加以下代码块并运行它:</p>
<pre><code># Filter data: prints all rows with 'machine5'
print '\nFilter by machine5'
machine5 = [row for row in data if row['name'] == 'machine5']
machine5 = sorted(machine5, key=lambda row: int(row['time']))
pprint(machine5)
# Get the last instance
print '\nLast instance of machine5:'
latest_row = machine5[-1]
pprint(latest_row)
</code></pre>
<p>不要忘记在脚本开头添加以下内容:</p>
<pre><code>from pprint import pprint
</code></pre>
<h2>输出</h2>
<pre><code>Filter by machine5
[{u'name': u'machine5', u'time': 12.67007, u'value': 5.068},
{u'name': u'machine5', u'time': 12.6801, u'value': 2.0868},
{u'name': u'machine5', u'time': 12.6901, u'value': 12.633},
{u'name': u'machine5', u'time': 12.69512, u'value': 13.13},
{u'name': u'machine5', u'time': 12.71517, u'value': 131.633}]
Last instance of machine5:
{u'name': u'machine5', u'time': 12.71517, u'value': 131.633}
</code></pre>
<h2>讨论</h2>
<p>如果不想按时间对行进行排序,请删除<code>sorted()</code>行,这将为您提供未排序的输出。</p>