根据NUB为每个类别选择一行

for line in sheet: line = sheet.strip().split("\n") parts = [] for part in line: parts = [] parts = part.split("\t") print parts [0], parts [1], parts[2], parts[3], parts[4]

3条回答

网友

1楼 · 编辑于 2024-09-28 22:25:32

您可以使用itertools.groupby根据第一项对拆分行进行分组，然后使用min函数和适当的key来选择所需行：

>>> from operator import itemgetter
>>> s=sorted((line.split() for line in sheet.strip().split('\n')[1:]),key=itemgetter(0))
>>> [' '.join(min(g,key=lambda x:float(x[4]))) for _,g in groupby(s,itemgetter(0))]
['Steve 32 foo spam 0.01', 'rob 45 bar foo 0.0000001']

网友

2楼 · 编辑于 2024-09-28 22:25:32

sheet= """ cmn1 cmn2 cmn3 cmn4 cmn5
rob  45   foo  bar  0.0001
Steve 32  foo  spam 0.01
rob   45  bar  foo  0.0000001
Steve 32  foo  bar  0.1"""

from collections import defaultdict

d = defaultdict(list)
spl = sheet.splitlines()
header = spl[0]
# iterate over all lines except header
for line in spl[1:]:
    # split once on whitespace using name as the key 
    name = line.split(None,1)[0]
    # append each line to our list of values
    d[name].append(line)

# get min of each line in our values based on the last float value
for v in d.values():
    print(min(v,key=lambda x: float(x.split()[-1])))

Steve 32  foo  spam 0.01
rob   45  bar  foo  0.0000001

如果订单很重要，您可以使用和订购信息通信技术广告，同时检查：

from collections import OrderedDict

d = OrderedDict()
spl = sheet.splitlines()
header = spl[0]
for line in spl[1:]:
    # unpack five elements after splitting
    # using name as key and f to cast to float and compare
    name, _, _, _, f = line.split()
    # if key exists compare float value to current float value
    # keeping or replacing the values based on the outcome
    if name in d and float(d[name].split()[-1]) > float(f):
        d[name] = line
    # else if first time seeing name just add it
    elif name not in d:
        d[name] = line

print(header)
for v in d.values():
    print(v)

cmn1 cmn2 cmn3 cmn4 cmn5
rob   45  bar  foo  0.0000001
Steve 32  foo  spam 0.01

使用您编辑的线，您可以看到输出未更改，它将与原来的完全相同：

for v in d.values():
    print(repr(v))

'rob\t45\tbar\tfoo\t0.0000001'
'Steve\t32\tfoo\tspam\t0.01

网友

3楼 · 编辑于 2024-09-28 22:25:32

您可以使用字典存储每个唯一列1的所有行：

sheet= """cmn1\tcmn2\tcmn3\tcmn4\tcmn5
rob\t45\tfoo\tbar\t0.0001
Steve\t32\tfoo\tspam\t0.01
rob\t45\tbar\tfoo\t0.0000001
Steve\t32\tfoo\tbar\t0.1"""

grouped = {}
for line in sheet.split('\n')[1:]:
  parts = line.split('\t')
  print (line)
  # Parse the numbers into numerical types
  typed = (parts[0], int(parts[1]), parts[2], parts[3], float(parts[4]))
  #Add the typed list of values into a list stored in our dict
  if parts[0] in grouped.keys():
    grouped[parts[0]].append(typed) 
  else:
    grouped[parts[0]] = [typed]

#Now you can go through all the keys in the dict and select the smallest  
smallest_per_group = []
for key in grouped:
  lines = grouped[key]
  # using the 'key' parameter tells Python to give us the line with the smallest 5th column
  smallest = min(lines, key=lambda x:x[4])
  smallest_per_group.append(smallest)

相关问题更多 >

编程相关推荐

热门问题

热门文章

根据NUB为每个类别选择一行

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >