我有一个如下的输入文件:
op.txt
user id query
4d67373f-ca45-4137-efd0-0da69c78123d , bookmy show
4d67373f-ca45-4137-efd0-0da69c78123d , book my show
4d67373f-ca45-4137-efd0-0da69c78123d , book my show
4d67373f-ca45-4137-efd0-0da69c78123d , book my show
7fda21a5-c432-4d95-f93d-6275b68bb396 , 8 gb pen drive
7fda21a5-c432-4d95-f93d-6275b68bb396 , 16 gb pen drive
dba91160-dec4-454c-f34a-c29d6d95c459 , DVD PLATERS
dba91160-dec4-454c-f34a-c29d6d95c459 , DVD PLAYERS
dba91160-dec4-454c-f34a-c29d6d95c459 , DVD PLAYERS
dba91160-dec4-454c-f34a-c29d6d95c459 , IPOD
dba91160-dec4-454c-f34a-c29d6d95c459 , IPOD
dba91160-dec4-454c-f34a-c29d6d95c459 , IPOD
dba91160-dec4-454c-f34a-c29d6d95c459 , IPAD
d900ec5f-bd71-4e2b-84d0-6a2105050923 , minoxidil
d900ec5f-bd71-4e2b-84d0-6a2105050923 , minoxidil 5
775f1159-e310-42b6-d3b0-5ea3fb959568 , printed backcase for xperia L
775f1159-e310-42b6-d3b0-5ea3fb959568 , printed backcase for xperia zr
775f1159-e310-42b6-d3b0-5ea3fb959568 , printed backcase for xperia zr
9b98a9be-bb63-4310-87d5-592a66ae602a , leggings
9b98a9be-bb63-4310-87d5-592a66ae602a , leggings
9b98a9be-bb63-4310-87d5-592a66ae602a , jeggings
83618338-70a0-4512-c763-0307fe5acef0 , woman jacket
83618338-70a0-4512-c763-0307fe5acef0 , woman jacket
83618338-70a0-4512-c763-0307fe5acef0 , man jacket
83618338-70a0-4512-c763-0307fe5acef0 , man jacket
从中我发现如下输出:
dvd platers > dvd players
ipod > ipad
bookmy show > book my show
leggings > jeggings
woman jacket > man jacket
minoxidil > minoxidil 5
printed backcase for xperia l > printed backcase for xperia zr
8 gb pen drive > 16 gb pen drive
主要目的是找到所有特定用户给定的查询,并存储在一个列表中。由此我需要找出所有查询的编辑距离。如果编辑距离小于2,那么我需要打印它。我的代码可以很好地找到,但是它不应该检查任何数字的变化,它只需要检查单词。例如,如果用户键入“8 gb pen drive”,一段时间后用户改变主意并键入“16 gb pen drive”我不想打印它。你知道吗
下面是我的代码:
def min_edit_dist(s1, s2):
m=len(s1)+1
n=len(s2)+1
tbl = {}
for i in range(m): tbl[i,0]=i
for j in range(n): tbl[0,j]=j
for i in range(1, m):
for j in range(1, n):
cost = 0 if s1[i-1] == s2[j-1] else 1
tbl[i,j] = min(tbl[i, j-1]+1, tbl[i-1, j]+1, tbl[i-1, j-1]+cost)
return tbl[i,j]
with open("op.txt") as text:
d = {}
for line in text:
line = line.strip("\n")
for lines in line.split("\n"):
try:
key, val = lines.split(",")
d.setdefault(key,[]).append(val.lower())
except:
pass
values = d.values()
keys = d.keys()
for v in values:
for i in range(0,len(v)-1):
if v[i]!= v[i+1]:
if min_edit_dist(v[i], v[i+1]) <= 2:
print v[i]+" > "+v[i+1]
我只需要如下输出:
dvd platers > dvd players
ipod > ipad
bookmy show > book my show
leggings > jeggings
woman jacket > man jacket
printed backcase for xperia l > printed backcase for xperia zr
您需要过滤
val
的值要从字符串中筛选出数字,请尝试
这将对提取的每个
val
字符串执行列表理解,并连接所有过滤的字符。不是一个非常有效的解决方案,但应该适合您的需要。你知道吗相关问题 更多 >
编程相关推荐