python文件中的文本/单词计数

2024-09-19 20:41:22 发布

您现在位置:Python中文网/ 问答频道 /正文

在聊天.txt在

ID674 25/01/1986 Thank you for choosing Optimus prime. Please wait for an Optimus prime Representative to respond. You are currently number 0 in the queue. You should be connected to an agent in approximately 0 minutes.. You are now chatting with 'Tom' 0      <br/>
ID674 2gb Hi there! Welcome to Optus Web Chat 0/0/0 . How can I help you today?  1 
ID674 25-01-1986 I would like to change my bill plan from $0 with 0 expiry to something else $136. I find it very unuseful. Sam my phone no is 9838383821   2

在上面提到的文本中,只是文件。我的要求是所有的日期,例如25/01/1986或0/0/0都应该替换为“DATE123”。
则:)应替换为“smileys123”。 货币,即0美元或136美元应替换为“currenc123”
“TOM”(通常是单引号中的代理名称)应替换为AGENT123
还有很多更多。那个输出应该是字符串出现的次数,如图所示

^{pr2}$

我现在有这个办法,请告诉我这件事

  class Replace:
     dateformat=DATE123
     smileys=smileys123
     currency=currency123

  count_dict={}

  function count_data(type,count):
     global count_dict
     if type in count_dict:
        count_dict[type]+=count
     else:
        count_dict = {type:count}


  f=open("chat.txt")
  while True:
     for line in f.readlines():
        print line,
        if ":)" in line:
           smileys = line.count(":)")
           count_data("smileys",smileys)
        elif "$number" in line :    #how to see whether it is currency or nor??
           currency=line.count("$number") //how can i do this
           count_data("currecny",currency)
        elif "1/2/3" in line :    #how to validate date format
           dateformat=line.count("dateformat") #how can i do this
           count_data("currency",currency)
        elif validate-agent-name in line:
           agent_name=line.count("agentname")  #How to do this get agentname in single quotes
           count_data("agent_name",agent_name)
     else:
        break
  f.close()

  for keys in count_dict:
     print keys,count_dict[keys]


  The following would be the ouput

  DATE123=2  smileys123=2 Currency123=6 AGENT123=5

Tags: tonameinyoufordatatypecount
2条回答

这并不是你说的你需要的所有替代品。但是这里有一种方法可以计算数据中的数据,使用正则表达式和默认字典。如果你真的想要替换字符串,我相信你可以弄明白:

lines = [
   "ID674 25/01/1986 Thank you for :) choosing Optimus prime. Please wait for an Optimus prime Representative to respond. You are currently number 0 in the queue. You should be connected to an agent in approximately 0 minutes.. You are now chatting with 'Tom' 0",
  "ID674 2gb Hi there! Welcome to Optus Web Chat 0/0/0 . $5.45 How can I help you today?  1",
  "ID674 25-01-1986 I would like to change my bill plan from $0 with 0 expiry to something else $136. I find it very unuseful. Sam my phone no is 9838383821   2'"
]

import re
from collections import defaultdict

p_smiley = re.compile(r':\)|:-\)')
p_currency = re.compile(r'\$[\d.]+')
p_date = re.compile(r'(\d{1,4}[/-]\d{1,4}[/-]\d{1,4})')

count_dict = defaultdict(int)

def count_data(type, count):
    global count_dict
    count_dict[type] += count

for line in lines:
    count_data('smiley', len(re.findall(p_smiley, line)))
    count_data('date', len(re.findall(p_date, line)))
    count_data('currency', len(re.findall(p_currency, line)))

Currencies i.e, $0 or $136 should be replaced with "Currency123" and 'TOM' (usually agents name in single quotes) should be replaced with AGENT123 and many more

我认为你的类重新计算应该被字典代替,在这种情况下,你可以在写更少代码的同时做更多的事情(因为它伴随着方法)。字典可以跟踪您需要替换wtih的内容,并提供更多选项来动态地更改您的替换需求。这样做,也许你的代码会更干净,更容易理解?一定要短一点,因为你有更多的替代词。在

编辑:您可能希望将替换单词列表保存在文本文件中,并将其加载到词典中。而不是把你的替换词硬编码到一个类中。我觉得这不是个好主意。既然你说了很多,那么这样做更有意义,写的代码更少(更干净!)在

评论。。。使用

# Here is a comment

你的代码风格不是最好的,如果你想学习更好的编码风格,请阅读http://www.python.org/dev/peps/pep-0008/#pet-peeves,甚至整章。在

下面是一个正则表达式,用于检查它是否是currency、“Tom”和日期。在

^{pr2}$

单输出:

Enter your string: $100
It is Money: $100
Enter your string: 100
Not good.
Enter your string: 'Tom'
It is a Name: 'Tom'
Enter your string: Tom
Not good.
Enter your string: 01/15/1989
It is a Date: 01/15/1989
Enter your string: 01151989
Not good.

您可以用其中一个isSomething变量替换条件,这取决于具体需要做什么。我想,我希望这能有所帮助。如果您想了解有关正则表达式的更多信息,请查看"Regular Expression Primer",或Python's RE Page。在

相关问题 更多 >