在前一行中添加所有不以字符开头的行

2024-10-05 10:50:28 发布

您现在位置:Python中文网/ 问答频道 /正文

对于文本文件:

[2018-07-11 20:57:08] SYSTEM RESPONSE: "hello"
[2018-07-11 20:57:19] USER INPUT (xvp_dev-0): "hi! how is it going?"
[2018-07-11 20:57:19] SYSTEM RESPONSE: "It's going pretty good. 
How about you?
What's good?
Up to anything new?
After a long time"

[2018-07-12 14:05:20] USER INPUT (xvp_dev-0): I've been doing good too!    
Thank you for asking.
Nothing is new so far. 
Just working on some projects.
[2018-07-12 20:57:19] SYSTEM RESPONSE: Great!

我希望我的输出看起来像:

    [2018-07-11 20:57:08] SYSTEM RESPONSE: "hello"
    [2018-07-11 20:57:19] USER INPUT (xvp_dev-0): "hi! how is it going?"


[2018-07-11 20:57:19] SYSTEM RESPONSE: "It's going pretty good. How about you?| What's good? Up to anything new?| After a long time"

    [2018-07-12 14:05:20] USER INPUT (xvp_dev-0): I've been doing good too!    |Thank you for asking. | Nothing is new so far. | Just working on some projects.
    [2018-07-12 20:57:19] SYSTEM RESPONSE: Great!

基本上,所有不以时间戳开头的行都转到前一行。 到目前为止,我尝试了:

 a , b = text_from_index.split(",") # so I get the file name and the date from this 
            with open("/home/Desktop/"+ a) as log_fd:
                file = log_fd.readlines()

                x =""

                for line in file:
                    if b in line: # b here is the date. eg- 2018-07-11
                        x = x + "//" + line[11:]
                    else:
                        x=x        
                x= x.replace("//","<br /> \n")
                x= x.replace("]","|")

                x= re.sub(r'\(.+?\)', '', x)

到目前为止,我只能通过搜索日期来获取行。 任何建议,都会有帮助!谢谢您! 请随时问我任何问题或进一步澄清


Tags: thedevyounewforinputsois
2条回答

你可以用regex来做这个。下面的正则表达式与您的时间戳完全匹配

import re
pattern = re.compile("\[(\d){4}\-(\d){2}\-(\d){2}\s(\d){2}:(\d){2}:(\d){2}\]")

# will match with your timestamp so you can skip these lines and concatenate others
pattern.match(line) 

完整的解决方案如下所示:

import re
pattern = re.compile("\[(\d){4}\-(\d){2}\-(\d){2}\s(\d){2}:(\d){2}:(\d){2}\]")

with open("test.txt") as log_fd:
file = log_fd.readlines()

x =""
last = False

for line in file:
    if not line in ['\n', '\r\n']:
        if pattern.match(line):
            if last:
                x = x + '\n' + line.strip('\r\n')
            else:
                x = x + '\n' + line.strip('\r\n')
        else:
            x = x + ' | ' + line.strip('\r\n')
        last = pattern.match(line)

print(x)

它将在字符串的开头有一个空行,但它将用字符串求解并打印出结果。绝对不是最优雅的

将当前行存储在var中,例如cur_line。如果下一行不是以[开头,则将cur_line写入新文件,否则将该行追加到cur_line

with open('tmp.txt') as in_file, open('out.txt', 'w') as out_file:
    lines = in_file.readlines()
    cur_line = ''
    for l in lines:
        l = l.rstrip('\r\n')
        if not l:
            continue
        if l[0] == '[':
            out_file.write(cur_line +'\n')
            cur_line = l
        else:
            cur_line += l
    out_file.write(cur_line +'\n')

相关问题 更多 >

    热门问题