我需要将逻辑数据字典转换为物理(缩写)数据字典-我在下面给出了4个用例。你知道吗
需要此psuedo代码/要求的帮助:
# empty dict declaration
refDict = {}
# to catch and report on any 'not-found' dictionary words to replace
noMatchFound = {}
# read from a dictionary of comma delimited dictionary
# with open('dictionary.csv') as inputDict:
# for line in inputDict:
# busTerm, busAbbr = line.split(',')
# refDict[busTerm] = busAbbr.replace("\n","")
# sample data dictionary entries
refDict = {
'user': 'USR',
'call': 'CALL',
'detail': 'DTL',
'record': 'REC',
'call detail record': 'CDR',
'count', 'CNT'}
input_string1="user call detail record"
# output should be "USR_CDR"
# noMatchFound - will be empty - since all are matched and replaced
input_string2="user test call detail record"
# output should be "USR_TEST_CDR"
# noMatchFound - should have an entry "TEST" with a refernce to "user test call detail record"
input_string3="user call count detail record"
# output should be "USR_CALL_CNT_DTL_REC"
# noMatchFound - will be empty - since all are matched and replaced
input_string4="user call detail record count"
# output should be "USR_CDR_CNT"
# noMatchFound - will be empty - since all are matched and replaced
到目前为止,我可以找出匹配任何可能的单个最大表达式的代码段,如下所示:
import re
# using regular expressions find longest matcing expression
def getLongestSequenceSize(inputStr, inDict):
ret_match = ""
ret_match_len = 0
ret_abbr = ""
for inKey in inDict:
matches = re.findall(r'(?:\b%s\b\s?)+' % inKey.strip().upper(), inputStr.strip().upper())
if len(matches) > 0:
longest_match = max(matches)
if ret_match_len < len(longest_match):
ret_match_len = len(longest_match)
ret_match = longest_match.strip()
ret_abbr = inDict[inKey]
return [ret_match.strip(), ret_abbr.strip()]
这个想法是你开始尝试
replace()
从字典中最大的字符串开始,然后检查给定字典的每一个可能的替换,从长到短。你知道吗这正是你所期望的:
输出:
相关问题 更多 >
编程相关推荐