Python：搜索文本文件，然后输出特定的d

for mergedData[(a, b, c), (e, f, g), .....]: if mergedData[(a, e, (all first sub-indices))] > 15 <delete the entire line from the .txt file> and/or <create a new text file containing only lines that meet the criteria>

1条回答

网友

1楼 · 发布于 2024-10-02 08:26:42

假设你的文件是一系列的行，每一行看起来都像你写的，即

000892834     13.663      0.098      0.871      0.093      0.745      4.611       4795

然后可以使用^{}去掉前导的0。当你读文件时，你没有得到整数，你得到了字符串，所以你必须去掉0字符。（或者，您可以将带有0s的数字转换为整数，然后将其转换为字符串以再次写入，但您不需要这样做。）

使用字典按ID对行，其键是一个列表，其中存储第一个文件中的行和第二个文件中的行。在

^{pr2}$

如果您的数据在第二个文件中有不在第一个文件中的键，则应该使用^{}表示mergedData。（这将解决编辑中的#1。）

from collections import defaultdict
mergedData = defaultdict(list)
with open('file1.txt', 'r') as file1, open('file2.txt', 'r') as file2, open('mergedData.txt', 'w') as outfile:
    for line in file1:
        mergedData[line.split()[0].lstrip('0')].append(line)
    for line in file2:
        mergedData[line.split()[0]].append(" ".join(line.split()[:4]))
    ...

如果只需要编写满足特定要求的数据，可以使用^{}只获取满足特定要求的元素。filter()接受一个过滤器函数，如果元素满足该要求，该函数必须返回True。将lambda表达式用于快速内联函数是一个很好的改变。在

   ...
   filteredMergedData = filter(lambda x: (len(x[1]) == 2) and (int(x[1][0].split()[1]) > 15 and int(x[1][1].split()[1]) > 15), mergedData.iteritems()
   for d in filteredMergedData:
       outfile.write("\n".join(d[1]) + "\n")

这相当复杂，但基本上，它将字典中的键、值对转换成像(key, value)这样的元组，并遍历它们，检查lambda是否返回True。lambda接受value部分（您还记得的列表），并检查第二列中是否有大于15的值。它必须将这些值强制转换为int，因为它们通常是字符串，不会与int进行比较。为了使子索引工作，您还必须检查以确保值部分包含两行-这也为您处理了#3。在

现在，如果你想把这一切放在一起，和支持任意条件和任意文件名，你应该把这段代码放到一个函数中，让它接受四个参数：三个文件名，以及一个函数（是的，你可以把函数作为参数）作为过滤器函数。在

from collections import defaultdict

def mergeData(file1name, file2name, outfilename, a_filter_func):
    """ Merge the data of two files. """
    mergedData = defaultdict(list)
    with open(file1name, 'r') as file1, open(file2name, 'r') as file2, open(outfilename, 'w') as outfile:
        for line in file1:
            mergedData[line.split()[0].lstrip('0')].append(line)
        for line in file2:
            mergedData[line.split()[0]].append(" ".join(line.split()[:4]))
        filteredMergedData = filter(a_filter_func, mergedData.iteritems())
        for d in filteredMergedData:
            outfile.write("\n".join(d[1]) + "\n")

# finally, call the function.
filter_func = lambda x: (len(x[1]) == 2) and (int(x[1][0].split()[1]) > 15 and int(x[1][1].split()[1]) > 15)
mergeData('file1.txt', 'file2.txt', 'mergedData.txt', filter_func)

如果需要其他条件，只需传递lambda filter_func以外的内容—您可以创建一个命名的“def”d函数，并根据需要传递该函数。例如，如果您有def foo(x):，则可以将foo作为参数传递。只需确保它返回True或False。在

编辑：仔细想想，基于lambda的解决方案需要四次线性迭代。下面是一个优化的（可能更简单）版本：

def mergeData(file1name, file2name, outfilename, a_filter_func):
    """ Merge the data of two files. """
    mergedData = defaultdict(list)
    with open(file1name, 'r') as file1, open(file2name, 'r') as file2, open(outfilename, 'w') as outfile:
        for line in file1:
            splt = line.split()
            if a_filter_func(splt[1]):
                mergedData[splt[0].lstrip('0')].append(line)
        for line in file2:
            splt = line.split()
            if a_filter_func(splt[1]):
                mergedData[splt[0]].append(" ".join(splt[:4]))
        for k in mergedData:
            outfile.write("\n".join(mergedData[k]) + "\n")

现在a_filter_func可能是简单的：

lambda x: x > 15

在我开始使用“函数式编程”函数（例如filter()）时，我忘了它可能更简单。我也只分了一次，而不是多次。在

相关问题更多 >

编程相关推荐

热门问题

热门文章