我有一个像这样的词库。有三千多字。但有两个文件:
File #1:
#fabulous 7.526 2301 2
#excellent 7.247 2612 3
#superb 7.199 1660 2
#perfection 7.099 3004 4
#terrific 6.922 629 1
#magnificent 6.672 490 1
File #2:
) #perfect 6.021 511 2
? #great 5.995 249 1
! #magnificent 5.979 245 1
) #ideal 5.925 232 1
day #great 5.867 219 1
bed #perfect 5.858 217 1
) #heavenly 5.73 191 1
night #perfect 5.671 180 1
night #great 5.654 177 1
. #partytime 5.427 141 1
我有很多这样的句子,3000多行如下:
^{pr2}$我必须通读每一行并完成以下任务:
1) 找出这些词库是否与句子中的任何地方匹配
2) 找出这些单词的词库是否与句子的开头和结尾相匹配
我能做第2部分,而不是第1部分)。我可以做,但要找到一个有效的方法。 我有以下代码:
for line in sys.stdin:
(id,num,senti,words) = re.split("\t+",line.strip())
sentence = re.split("\s+", words.strip().lower())
for line1 in f1: #f1 is the file containing all corpus of words like File #1
(term2,sentimentScore,numPos,numNeg) = re.split("\t", line1.strip())
wordanalysis["trail"] = found if re.match(sentence[(len(sentence)-1)],term2.lower()) else not(found)
wordanalysis["lead"] = found if re.match(sentence[0],term2.lower()) else not(found)
for line in sys.stdin:
(id,num,senti,words) = re.split("\t+",line.strip())
sentence = re.split("\s+", words.strip().lower())
for line1 in f1: #f1 is the file containing all corpus of words like File #1
(term2,sentimentScore,numPos,numNeg) = re.split("\t", line1.strip())
wordanalysis["trail"] = found if re.match(sentence[(len(sentence)-1)],term2.lower()) else not(found)
wordanalysis["lead"] = found if re.match(sentence[0],term2.lower()) else not(found)
for line1 in f2: #f2 is the file containing all corpus of words like File #2
(term2,sentimentScore,numPos,numNeg) = re.split("\t", line1.strip())
wordanalysis["trail_2"] = found if re.match(sentence[(len(sentence)-1)],term.lower()) else not(found)
wordanalysis["lead_2"] = found if re.match(sentence[0],term.lower()) else not(found)
我做得对吗?有更好的方法吗。在
这是一个典型的map reduce问题,如果您想认真考虑效率,您应该考虑如下内容:http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
如果您太懒/没有太多资源来设置自己的hadoop环境,您可以尝试一个现成的http://aws.amazon.com/elasticmapreduce/
请在代码完成后在这里发布:)很高兴看到它是如何翻译成mapreduce算法的。。。在
相关问题 更多 >
编程相关推荐