Python脚本未按要求工作

2024-06-26 14:30:25 发布

您现在位置:Python中文网/ 问答频道 /正文

您好,我正在编写一个python脚本来生成web页面的每月和每日访问计数。输入文件:

ArticleName Date        Hour    Count/Visit
Aa   20130601    10000   1
Aa   20130601    10000   1
Ew   20130601    10000   1
H    20130601    10000   2
H    20130602    10000   1
R    20130601    20000   2
R    20130602    10000   1
Ra   20130601    0   1
Ra   20130601    10000   2
Ra   20130602    10000   1
Ram  20130601    0   2
Ram  20130601    10000   3
Ram  20130602    10000   4
Re   20130601    20000   1
Re   20130602    10000   3
Rz   20130602    10000   1

我需要计算每月和每天的网页浏览量。在

输出:

^{pr2}$

我的剧本:

^{3}$

我可以得到大部分输出,但我的输出有两种情况是错误的: 1如果ArticleName和ArticleDate相同,则无法获取汇总ArticleName的方法。 对于eg,此脚本给出了Ra行的输出: Ra 20130601 1 1号 Ra 20130601 3 3 Ra 20130602 1 1 所以在最后,Ra应该打印1+3+1=5作为最终的每月总计数,而不是1。在

  1. 因为我在第三个if条件中显示了所有不等于上一个文章的文章,所以我得到了一个具有相同文章名称和日期的文章的值两次。Like:Ra 20130601 1 1不应打印。 有人知道怎么纠正吗? 如果你需要更多信息,请告诉我。在

Tags: 文件re脚本webdatecount文章页面
3条回答

尝试以下操作:

import itertools
import operator
import sys

lines = (line.split() for line in sys.stdin)
prev_name, prev_month = '', '99999999'
month_view = 0
for (name,date), grp in itertools.groupby(lines, key=operator.itemgetter(0,1)):
    view = sum(int(row[-1]) for row in grp)
    if prev_name == name and date.startswith(prev_month):
        month_view += view
    else:
        prev_name = name
        prev_month = date[:6]
        month_view = view
    print '{}\t{}\t{}\t{}'.format(name, date, view, month_view)

使用^{}^{}。在

输出不同:

^{pr2}$

最简单的方法是构建一个以页面名称为键和值的双字典,它是一个从日期到视图数量的字典,迭代列表并构建字典,然后迭代每个页面的字典,并计算每个月的页数。在

更好的方法是使用itertools中的map-reduce函数:http://docs.python.org/2/howto/functional.html

import itertools
from itertools import groupby
from itertools import dropwhile
import sys
import datetime

# Convert list of words found in one line into
# a tuple consisting of a name, date/time and number of visits
def get_record(w):
    name = w[0]
    date = datetime.datetime.strptime((w[1] + ('%0*d' % (6, int(w[2])))), "%Y%m%d%H%M%S")
    visits = int(w[3])
    return (name, date, visits)

# Takes a tuple representing a single record and returns a tuple
# consisting of a name, year and month on which the records will
# be grouped.
def get_key_by_month((name, date, visits)):
    return (name, date.year, date.month)

# Takes a tuple representing a single record and returns a tuple
# consisting of a name, year, month and day on which the records will
# be grouped.
def get_key_by_day((name, date, visits)):
    return (name, date.year, date.month, date.day)

# Get a list containing lines, each line containing
# a list of words, skipping the first line
words = (line.split() for line in sys.stdin)
words = dropwhile(lambda x: x[0]<1, enumerate(words))
words = map(lambda x: x[1], words)

# Convert to tuples containg name, date/time and count 
records = list(get_record(w) for w in words)

# Group by name, month
groups = groupby(records, get_key_by_month)

# Sum visits in each group
print('Visits per month')
for (name, year, month), g in groups:
    visits = sum(map(lambda (name,date,visits): visits, g))
    print name, year, month, visits

# Group by name, day
groups = groupby(records, get_key_by_day)

# Sum visits in each group
print ('\nVisits per day')
for (name, year, month, day), g in groups:
    visits = sum(map(lambda (name,date,visits): visits, g))
    print name, year, month, day, visits

以上代码的Python3版本:

^{pr2}$

相关问题 更多 >