我有一个数据300输出.out
文件,需要从中获取数据。
通常情况下,数据以如下方式存储在其中:
PROPERTY 1: 1234
lines
of
unimportant text
PROPERTY 2: 1334
lines
of
unimportant text
PROPERTY 3: 1237
.
.
.
PROPERTY N: 7592
我有300个这样的档案
我想从这些文件中提取数据,并将它们排列成整齐的列。一列表示属性1的所有数据点,一列表示属性2,…,一列表示属性N。最终目标是使用python和pandas进一步处理数据
我正在使用awk提取这些数据
我有两种方法,但每种方法都有一个问题。
方法一:
awk '/PROPERTY 1/{p1=$NF; } /PROPERTY 2/{p2=$NF} /PROPERTY 3/... {pn=$NF; print p1, p2, p3,...}' *.out
这种方法有两个问题:
我可以提取单个数据点并将其存储到文件中,但是,这是一个很长的程序。 此外,如果属性1和属性2的位置颠倒,此代码将给出错误的输出,即outputfile1.out中的属性1将显示在第2行,而不是第1行。我如何使其不出现故障
我的第二种方法是简单地将它们输出到不同的文件中,并使用python将它们连接在一起。有没有办法从文件1中选取一列,并使用awk将其粘贴到文件2中的列旁边
示例输入文件:
先出:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
PROPERTY 1: 1234
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit
PROPERTY 2: 9800
At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga.
PROPERTY 4: 823586
On the other hand, we denounce with righteous indignation and dislike men who are so beguiled and demoralized by the charms of pleasure of the moment, so blinded by desire, that they cannot foresee the pain and trouble that are bound to ensue; and equal blame belongs to those who fail in their duty through weakness of will, which is the same as saying through shrinking from toil and pain.
PROPERTY 3: 328497
.
.
.
第二点:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
PROPERTY 1: 1
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit
PROPERTY 2: 2
At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga.
PROPERTY 3: 3
On the other hand, we denounce with righteous indignation and dislike men who are so beguiled and demoralized by the charms of pleasure of the moment, so blinded by desire, that they cannot foresee the pain and trouble that are bound to ensue; and equal blame belongs to those who fail in their duty through weakness of will, which is the same as saying through shrinking from toil and pain.
PROPERTY 4: 4
.
.
.
每个文件都将具有所有属性
预期输出文件: data.txt
1234 9800 823586 328497 ...
1 2 3 4
.
.
.
我正在尝试优化我的代码,而awk似乎正在快速发展。如果您有任何建议,我们将不胜感激
将GNU awk用于ENDFILE,并假设您有要打印的属性标记的特定子集,而不是所有属性标记都出现在每个文件中(您发布的示例对此不清楚,或者属性是否都以属性开头,等等):
我会逐行分析:
相关问题 更多 >
编程相关推荐