<p>好吧,我想我从你的评论中了解到了足够的信息来提供这个解决方案。下面的函数允许您选择UK或US替换(它使用US默认值,但您当然可以翻转它),并允许您对字符串执行轻微的卫生处理。你知道吗</p>
<pre><code>import re
ukus={'COLOUR':'COLOR','CHEQUE':'CHECK',
'PROGRAMME':'PROGRAM','GREY':'GRAY',
'JEWELLERY':'JEWELERY','ALUMINIUM':'ALUMINUM',
'THEATER':'THEATRE','LICENSE':'LICENCE','ARMOUR':'ARMOR',
'ARTEFACT':'ARTIFACT','CENTRE':'CENTER',
'CYPHER':'CIPHER','DISC':'DISK','FIBRE':'FIBER',
'FULFILL':'FULFIL','METRE':'METER',
'SAVOURY':'SAVORY','TONNE':'TON','TYRE':'TIRE'}
usuk={'COLOR':'COLOUR','CHECK':'CHEQUE',
'PROGRAM':'PROGRAMME','GRAY':'GREY',
'JEWELERY':'JEWELLERY','ALUMINUM':'ALUMINIUM',
'THEATRE':'THEATER','LICENCE':'LICENSE','ARMOR':'ARMOUR',
'ARTIFACT':'ARTEFACT','CENTER':'CENTRE',
'CIPHER':'CYPHER','DISK':'DISC','FIBER':'FIBRE',
'FULFIL':'FULFILL','METER':'METRE','SAVORY':'SAVOURY',
'TON':'TONNNE','TIRE':'TYRE'}
def str_wd_count(my_string, uk=False, hygiene=True):
us = not(uk)
# if the UK flag is TRUE, default to UK version, else default to US version
print "Using the "+uk*"UK"+us*"US"+" dictionary for default words"
# optional hygiene of non-alphanumeric characters for pure word counting
if hygiene:
my_string = re.sub('[^ \d\w]',' ',my_string)
my_string = re.sub(' {1,}',' ',my_string)
# create a list of the unqique words in the text
ttl_wds = [ukus.get(w,w) if us else usuk.get(w,w) for w in my_string.upper().split(' ')]
wd_counts = {}
for wd in ttl_wds:
wd_counts[wd] = wd_counts.get(wd,0)+1
return wd_counts
</code></pre>
<p>作为使用示例,请考虑字符串</p>
<pre><code>str1 = 'The colour of the dog is not the same as the color of the tire, or is it tyre, I can never tell which one will fulfill'
# Resulting sorted dict.items() With Default Settings
'[(THE,5),(TIRE,2),(COLOR,2),(OF,2),(IS,2),(FULFIL,1),(NEVER,1),(DOG,1),(SAME,1),(IT,1),(WILL,1),(I,1),(AS,1),(CAN,1),(WHICH,1),(TELL,1),(NOT,1),(ONE,1),(OR,1)]'
# Resulting sorted dict.items() With hygiene=False
'[(THE,5),(COLOR,2),(OF,2),(IS,2),(FULFIL,1),(NEVER,1),(DOG,1),(SAME,1),(TIRE,,1),(WILL,1),(I,1),(AS,1),(CAN,1),(WHICH,1),(TELL,1),(NOT,1),(ONE,1),(OR,1),(IT,1),(TYRE,,1)]'
# Resulting sorted dict.items() With UK Swap, hygiene=True
'[(THE,5),(OF,2),(IS,2),(TYRE,2),(COLOUR,2),(WHICH,1),(I,1),(NEVER,1),(DOG,1),(SAME,1),(OR,1),(WILL,1),(AS,1),(CAN,1),(TELL,1),(NOT,1),(FULFILL,1),(ONE,1),(IT,1)]'
# Resulting sorted dict.items() With UK Swap, hygiene=False
'[(THE,5),(OF,2),(IS,2),(COLOUR,2),(ONE,1),(I,1),(NEVER,1),(DOG,1),(SAME,1),(TIRE,,1),(WILL,1),(AS,1),(CAN,1),(WHICH,1),(TELL,1),(NOT,1),(FULFILL,1),(TYRE,,1),(IT,1),(OR,1)]'
</code></pre>
<p>您可以以任何方式使用生成的字数字典,如果您需要添加修改的原始字符串,那么修改函数以返回该值就足够简单了。你知道吗</p>