从邮件日志文件python中计算电子邮件地址的字典

2024-10-02 08:30:08 发布

您现在位置:Python中文网/ 问答频道 /正文

专家们,我试着在maillog文件中计算电子邮件地址和他们的代表的数量,我可以用正则表达式来计算(搜索)或者(重新匹配)但我希望这一切都能完成(关于芬德尔)目前我正在涉猎。。如有任何建议,不胜感激。。在

1)代码行。。。在

# cat maillcount31.py
#!/usr/bin/python
import re
#count = 0
mydic = {}
counts = mydic
fmt = " %-32s %-15s"
log =  open('kkmail', 'r')

for line in log.readlines():
        myre = re.search('.*from=<(.*)>,\ssize', line)
        if myre:
           name = myre.group(1)
           if name not in mydic.keys():
              mydic[name] = 0
           mydic[name] +=1

for key in counts:
   print  fmt % (key, counts[key])

2) Output from the Current code..

# python maillcount31.py
 root@MyServer1.myinc.com         13
 User01@MyServer1.myinc.com       14

Tags: keynameinfrompyrelogfor
3条回答

你可以考虑使用一个很好的命令表的熊猫,没有它可以考虑的结果。在

 import pandas as pd

 emails = pd.Series(email_list)
 individual_emails = emails.unique()

 tally = pd.DataFrame( [individual_emails , [0]*len(individual_emails)] )
 #makes a table with emails and a zeroed talley

 for item in individual_emails.index:
      address = tally.iloc[item,0]
      sum = len(email[email==address])

      tally.iloc[item,1] = sum


 print tally

希望这能帮助。。。在

from collections import Counter
emails = re.findall('.*from=<(.*)>,\ssize', line)# Modify re according to your file pattern  OR line pattern. If findall() on each line, each returned list should be combined.
result = Counter(emails)# type is <class 'collections.Counter'>
dict(result)#convert to regular dict

在关于芬德尔()将返回一个列表。查看How can I count the occurrences of a list item in Python?,有其他方法可以计算返回列表中的单词数。在

顺便说一下,计数器的有趣功能:

^{pr2}$

因此,如果文件很大,我们可以计算每一行,并通过计数器将它们组合起来。在

我希望下面的代码能有所帮助。在

但是,有三点需要注意:

  1. 打开文件时使用(with
  2. 在遍历字典时,使用iteritems()
  3. 在处理容器时,collections是你最好的朋友

#!/usr/bin/python
import re
from collections import Counter 

fmt = " %-32s %-15s"
filename = 'kkmail'

# Extract the email addresses
email_list = []
with open(filename, 'r') as log:
   for line in log.readlines():
      _re = re.search('.*from=<(.*)>,\ssize', line)
         if _re:
            name = _re.group(1)
            email_list.append(name)

# Count the email addresses
counts = dict(Counter(email_list)) # List to dict of counts: {'a':3, 'b':7,...}
for key, val in counts.iteritems():
   print  fmt % (key, val)

相关问题 更多 >

    热门问题