如何从两列数据集中找到最可能的字符串对?

2024-09-30 16:26:47 发布

您现在位置:Python中文网/ 问答频道 /正文

给定A列和B列,如何在B列中找到最可能的项A列中的每个项?基于嵌套哈希映射的东西呢?我想用Python来做。你知道吗

输入:

a,abd37534c7d9a2efb9465de931cd7055ffdb8879563ae98078d6d6d5
a,abd37534c7d9a2efb9465de931cd7055ffdb8879563ae98078d6d6d5
a,abd37534c7d9a2efb9465fghfghfghfghfghrewresdasdzfdghhgfhg
a,abd3753dfrtdgfdg563ae98078d6dfgfdgdfghdgasdaSADFBVFDGFD5
b,c681e18b81edaf2b66dd22376734dba5992e362bc3f91ab225854c17

输出:

a,abd37534c7d9a2efb9465de931cd7055ffdb8879563ae98078d6d6d5
b,c681e18b81edaf2b66dd22376734dba5992e362bc3f91ab225854c17

Tags: abd37534c7d9a2efb9465de931cd7055ffdb8879563ae98078d6d6d5c681e18b81edaf2b66dd22376734dba5992e362bc3f91ab225854c17abd3753dfrtdgfdg563ae98078d6dfgfdgdfghdgasdasadfbvfdgfd5abd37534c7d9a2efb9465fghfghfghfghfghrewresdasdzfdghhgfhg
1条回答
网友
1楼 · 发布于 2024-09-30 16:26:47

我假设“最有可能”是指每个{a,b}出现率最高的一个。你知道吗

尽管可能有一些语法问题,但下面的方法可能会起作用。在任何情况下,它都会让你知道如何解决问题(如果不能帮你解决的话)。你知道吗

tupleList = [('a','abd37534c7d9a2efb9465de931cd7055ffdb8879563ae98078d6d6d5'),
             ('a','abd37534c7d9a2efb9465de931cd7055ffdb8879563ae98078d6d6d5'),
             ('a','abd37534c7d9a2efb9465fghfghfghfghfghrewresdasdzfdghhgfhg'),
             ('a','abd3753dfrtdgfdg563ae98078d6dfgfdgdfghdgasdaSADFBVFDGFD5'),
             ('b','c681e18b81edaf2b66dd22376734dba5992e362bc3f91ab225854c17')]
# Load your list of a,blah into tupleList
myHashMap = {}
for col1, col2 in tupleList:
  if col1 not in myHashMap:
   myHashMap[col1] = {}
  if col2 not in myHashMap[col1]:
   myHashMap[col1][col2] = 0
  myHashMap[col1][col2] += 1

# Now iterate over to find the one with highest occurrence.
for col in myHashMap:
  maxKey = ''
  maxVal = 0
  for col2 in myHashMap[col1]:
    if myHashMap[col1][col2] > maxVal:
     maxVal = myHashMap[col1][col2]
     maxKey = col2
  print 'Most probable for %s is %s'%(col, maxKey)

相关问题 更多 >