只选取低于某个阈值的值问题的回答

只选取低于某个阈值的值

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

假设文件每行有一个数字： <pre><code>threshold = 5 with open('path/to/file') as infile: numbers = [float(line.strip()) for line in infile] numbers.sort(reverse=True) bigger = list(itertools.takewhile(lambda n: n<threshold, numbers)) </code></pre> 如果文件如下所示： ^{pr2}$ 您希望您的输出是<code>set([2,3,5])</code>，然后： <pre><code>with open('path/to/file') as infile: numbers = dict([float(i) for i in line.strip()] for line in infile) lines = sorted(numbers, key=numbers.__getitem__, reverse=True) answer = list(itertools.takewhile(lambda n: numbers[n]<threshold, lines)) </code></pre> 给定一个如下所示的文件： <pre><code>Mod# 2 11494 Chi^2: 1.19608371367 Scale: 0.567691651772 Tin: 1499 Teff: 3400 Luminosity: 568.0 L M-dot: 4.3497e-08 Tau: 2.44E-01 Dust composition: Fe IRx1: 0.540471121182 </code></pre> 如果有一个制表符（<code>\t</code>）分隔<code>11494</code>和<code>Chi^2</code>，则以下脚本应该可以工作： <pre><code>def takeUntil(fpath, colname, threshold): lines = [] with open(fpath) as infile: for line in infile: ldict = {} firsts = line.split('\t', 2) ldict[firsts[0] = float(firsts[1]) splits = firsts[2].split('\t') ldict.update(dict(zip(firsts, itertools.islice(firsts, 1, len(firsts))))) lines.append(ldict) lines.sort(reverse=True, key=operator.itemgetter(colname)) return [row['Mod#'] for row in itertools.takewhile(lambda row: row[colname]<threshold, lines)] </code></pre> 使用该函数，您应该能够指定要检查哪些列的值低于阈值。尽管此算法确实具有更高的空间复杂性（使用的RAM比绝对需要的多），但您应该能够在读取文件后marshall/pickle<code>lines</code>，并从那里继续进行后续的运行。如果你有一个巨大的输入文件需要一段时间来处理（我想你可能已经有了），这一点特别有用

只选取低于某个阈值的值

1 个回答

相关Python问题