递归导航文件系统以成对分析文件

A > ... > C1 > xyz_1.gz A > ... > C1 > xyz_2.gz A > ... > C1 > bunch of other files A > ... > C2 > xyy_1.gz A > ... > C2 > xyy_2.gz A > ... > C2 > bunch of other files A > ... > C3 > zzz_1.gz A > ... > C3 > zzz_2.gz A > ... > C3 > bunch of other files A > B > some other things

2条回答

网友

1楼 · 编辑于 2024-09-28 22:19:15

（回答编辑后的问题。）

在shell中实现这一点比较困难（可读性较差），因此我求助于Python：

#!/usr/bin/env python3
import os
import re
import pprint
from sets import Set
from subprocess import call

group1 = {} # collect here the filenames for _1
group2 = {} # collect here the filenames for _2

for root, directories, filenames in os.walk('.'):
        for filename in filenames:
                ff = os.path.join(root,filename)
                if filename.endswith("_1.txt"):
                        base = re.sub('_1\.txt$','', ff)
                        group1[base] = ff
                if filename.endswith("_2.txt"):
                        base = re.sub('_2\.txt$','', ff)
                        group2[base] = ff

#pprint.pprint(group1)
#pprint.pprint(group2)

# find common ones: the dirs which contain the files with the common prefix:
list1 = Set(group1.keys()).intersection(Set(group2.keys()))

#pprint.pprint(list1)

# call the myscript.py
cwd = os.getcwd()
for base in list1:
        path, filename = os.path.split(base)
        #print path," ",filename
        try:
                os.chdir(path)
                call(['echo', 'myscript.py', filename+"_1.txt", filename+"_2.txt", "outputfile"])
        finally:
                os.chdir(cwd)

（为糟糕的Python风格感到抱歉：我实际上是一个Perl程序员。）

Most recursive solutions I have seen so far use either find or grep for each individual file however I need the location as well, to get them in pairs and write to disk at the appropriate place.

不要迭代文件-遍历目录。shell中的示例：

^{pr2}$

或者，您仍然可以迭代文件，让find为我们检查其中一个文件。然后从找到的文件名中提取目录：

find -type f -name xyz_1.gz -print |
while read FN; do
    DIR=`dirname $FN`
    test -r $DIR/xyz_2.gz -a -r $DIR/some_other_file || continue
    ( cd $DIR; myscript.py xyz_1.gz xyz_2.gz outputfile )
done

此外，您还可以将开头的cd $DIR（os.chdir()）；将目录作为参数或env var传递到Python脚本本身，并检查输入文件（例如，如果文件不存在，则自动退出）。在

网友

2楼 · 编辑于 2024-09-28 22:19:15

下面是执行以下操作的bash脚本：

for i in */*/*.gz
do
    echo "$i"
done | sort | while read -r line || [[ -n "$line" ]]
do
    read -r nextline
    $(cd $(dirname "$line") && python3 ~/A/myscript.py "$line" "$nextline" ./outputfile) && echo "Success"
done

脚本在递归性方面非常严格，但我根据您的目录结构应用了它

不知道具体有多少文件，但类似的东西可以为您工作：

^{pr2}$

我已经创建了一个虚拟的python脚本，它写出作为参数提供给它的文件名。这是python脚本：

import sys
#0 is script name itself
input_file1=sys.argv[1]
input_file2=sys.argv[2]
output_file=sys.argv[3]
s=input_file1+"\n"+input_file2+"\n"
with open(output_file, "w") as f:
    f.write(s)

相关问题更多 >

编程相关推荐

热门问题

热门文章