import re
import pandas as pd
all_the_data = ""
pattern = r'^001.*$'
with open('aws.txt') as f, open('aws_out.csv','w') as output:
all_the_data = ""
for line in f:
if not re.search(pattern, line):
all_the_data = re.sub("\n$", "", all_the_data)
all_the_data = "".join([all_the_data, line])
output.write(all_the_data)
df = pd.read_csv('aws_out.csv', header=None, sep='\t')
print(df)
此oneliner应具备以下功能:
它读取文件并替换多余的换行符,然后将csv文件的字符串表示形式加载到带有
StringIO
的pandas中鉴于问题Google doc的案文:
…使用来自Concatenate lines with previous line based on number of letters in first column的代码,如果“好的行”都从
001
开始(这就是下面正则表达式检查的内容),您可以尝试代码:
输出示例:
相关问题 更多 >
编程相关推荐