这里有人能告诉我如何将非结构化文件导入熊猫吗
我所说的非结构化是指:
2021-01-26T09:40:01.192Z info hostd[2101947] [Originator@6876 sub=Default opID=823a15d0] Accepted password for user root from 127.0.0.1
2021-01-26T09:40:01.192Z info hostd[2101947] [Originator@6876 sub=Vimsvc opID=823a15d0] [Auth]: User root
2021-01-26T09:40:01.193Z info hostd[2101947] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=823a15d0] Event 24138 : User root@127.0.0.1 logged in as pyvmomi
2021-01-26T09:40:01.268Z info hostd[2101940] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=823a15de user=root] Event 24139 : User root@127.0.0.1 logged out (login time: Tuesday, 26 January, 2021 09:40:01 AM, number of API invocations: 0, user agent: pyvmomi)
我尝试了多种方法并在谷歌上搜索了一下,但每个人似乎都在导入结构良好的CSV文件,并且找不到任何日志文件导入引用(我不是程序员,只是想用熊猫编写这个小程序)
*多种情况,如:
# giving a range for column names but this is not adequate if I want to search throught the logs for errors later I'd have to use all 54 columns ?!
pd.read_csv("mylog",sep='\s+',header=None,error_bad_lines=False, engine="python",quoting=csv.QUOTE_NONE,names=range(55))
# or putting everything into index :D
pd.read_csv("mylog",sep='\t', lineterminator='\n', index_col=0)
*oh yeah, want to use timeframe as INDEX column*
pd.read_csv("mylog", sep = None, iterator = True)
我们的想法是
提前谢谢
我的建议是首先解析文件,然后编辑其内容,最后从中创建一个数据帧
哪些产出:
一旦获得数据帧,就可以执行其他所有选项,比如将日期设置为索引
编辑:
要检查与此正则表达式不匹配的行,可以执行以下操作
然后,一旦获得了格式为
(<number>, <Match_or_None>)
的元组列表,就可以检查哪些行没有被正则表达式匹配识别,并相应地更新正则表达式/question相关问题 更多 >
编程相关推荐