在Python中转换Dataframe中没有分隔符的文本？

2021-04-01T12:54:38.156Z START RequestId: 123 Version: $LATEST 2021-04-01T12:54:42.356Z END RequestId: 123 2021-04-01T12:54:42.356Z REPORT RequestId: 123 Duration: 4194.14 ms Billed Duration: 4195 ms Memory Size: 2048 MB Max Memory Used: 608 MB

2条回答

网友

1楼 · 编辑于 2024-05-03 22:29:57

这里有一个类似的问题： Log file to Pandas Dataframe

可以将read_csv与分隔符一起使用：\s*\[

网友

2楼 · 编辑于 2024-05-03 22:29:57

显然，我仍然不擅长在这个网站上提出正确的问题，但很高兴自己能更好地找到解决方案，所以如果其他人也有同样的问题，这就是我所做的：

import re
import gzip

counter = 0

for file in file_list:
    # open and read
    file_content = gzip.open(file, 'rb').read().decode("utf-8")
    
    # split file in lines
    splitted_file_content = file_content.splitlines()
    for line in splitted_file_content:
        # look for the report lines
        if re.search('REPORT', line):
            tokens = line.split()
    
            timestamp = tokens[0]
            id = tokens[3]
            billed_duration = tokens[9]
            max_memory_size_used = tokens[18]
            init_duration = tokens[22]
            
            # if you want to pack it in a dataframe
            df.loc[counter] = [timestamp, id, billed_duration,
                               max_memory_size_used, init_duration]
            counter += 1

相关问题更多 >

编程相关推荐

热门问题

热门文章

在Python中转换Dataframe中没有分隔符的文本？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >