在中搜索优化的选择Pandas.datafram

toto tata titi tutu tete 0 a 18 600 700 4.5 1 b 18 600 800 10.1 2 c 18 600 700 12.6 3 d 3 300 400 3.4 4 a 16 900 1000 6.0 5 a 18 600 800 10.1 6 c 3 300 400 3.0 7 a 16 900 1000 6.0

import pandas indicesToKeep = [] indicesToRemove = [] reader = pandas.read_csv('/Users/steph/work/perso/sof/test.csv') columns = reader.columns for i in reader['titi'].unique(): #temp = reader[[:]].query('titi == i')#does not work ! temp = reader.loc[(reader.titi == i),columns] for j in temp['tutu'].unique(): temp2 = temp.loc[(temp.tutu == j),columns] minimum = min(temp2.tete) indicesToKeep.append(min( temp2[temp2.tete==minimum].index.tolist())) ################ # compute the complement of indicesToKeep #but I don't remember the pythonic syntax for i in range(len(reader)): if i not in indicesToKeep: indicesToRemove.append(i) ############################ reader = reader.drop(indicesToRemove)

2条回答

网友

1楼 · 编辑于 2024-09-30 01:21:05

您可以按两列titi和tutu分组，然后获得第三行tete的最小值的行索引。完成后，只需查找行。你知道吗

df.loc[df.groupby(["titi", "tutu"])["tete"].idxmin()]

这将返回输出

  toto  tata  titi  tutu  tete
6    c     3   300   400   3.0
0    a    18   600   700   4.5
1    b    18   600   800  10.1
4    a    16   900  1000   6.0

这是如上所述的期望输出。你知道吗

groupby将确保保留这两列的所有可能组合。你知道吗

网友

2楼 · 编辑于 2024-09-30 01:21:05

IIUCsort_values+drop_duplicates，如果你起诉pandas试图不使用for循环，大多数时候它比矢量化方法慢

df.sort_values('tete').drop_duplicates(['titi','tutu']).sort_index()
Out[583]: 
  toto  tata  titi  tutu  tete
0    a    18   600   700   4.5
1    b    18   600   800  10.1
4    a    16   900  1000   6.0
6    c     3   300   400   3.0

相关问题更多 >

编程相关推荐

热门问题

热门文章