这不是重复的,我真的很想这么做,但我做不到
我有这个日志文件,我想将所有信息归档到数据库中
interface: tun0 (10.8.0.0/255.255.255.0)
filter: ( port 53 ) and (ip || ip6)
#
U 2020/03/04 16:28:01.138292 10.8.0.4:52014 -> 8.8.8.8:53 #1
.|...........www.google.com.....
#
U 2020/03/04 16:28:03.011371 10.8.0.4:57054 -> 8.8.8.8:53 #3
cm...........crm.teste.com.....
#
U 2020/03/04 16:28:03.033610 8.8.8.8:53 -> 10.8.0.4:57054 #4
cm...........crm.teste.com................/.rosa.ns
cloudflare...dns.5y3MD..'....`..:.....
#
U 2020/03/04 16:28:05.166480 10.8.0.4:57284 -> 8.8.8.8:53 #5
.{...........crm.teste.tk.....
#
U 2020/03/04 16:28:05.183755 8.8.8.8:53 -> 10.8.0.4:57284 #6
.{...........crm.teste.tk................0.a.ns...joost.zuurbier.dot..^_.H..*0......:.....
#
U 2020/03/04 16:28:11.153329 10.8.0.4:58086 -> 8.8.8.8:53 #7
.............cbdfhkrlmnsxtvwz.neverssl.com.....
#
U 2020/03/04 16:28:11.180992 8.8.8.8:53 -> 10.8.0.4:58086 #8
.............cbdfhkrlmnsxtvwz.neverssl.com..............;...............;...............;...............;.....=
#
U 2020/03/04 16:28:15.851360 10.8.0.4:60006 -> 8.8.8.8:53 #9
.............plus.l.google.com.....
#
U 2020/03/04 16:28:15.859538 8.8.8.8:53 -> 10.8.0.4:60006 #10
.............plus.l.google.com..............+...:.n
#
U 2020/03/04 16:28:17.316359 10.8.0.4:59708 -> 8.8.8.8:53 #11
.X...........endpoint.prod.eu-west-1.forester.a2z.com.....
#
U 2020/03/04 16:28:17.322547 8.8.8.8:53 -> 10.8.0.4:59708 #12
.X...........endpoint.prod.eu-west-1.forester.a2z.com.................6.T4............4./p............4.5}............cP.%............6.V)............4...............6L.G............6Le.
#
U 2020/03/04 16:28:17.335399 10.8.0.4:53174 -> 8.8.8.8:53 #13
&-...........aafreudservice.prod.us-east-1.freud.titan.assistant.a2z.com.....
#
U 2020/03/04 16:28:17.341750 8.8.8.8:53 -> 10.8.0.4:53174 #14
&-...........aafreudservice.prod.us-east-1.freud.titan.assistant.a2z.com..............,.B'aafreudservice-elb-v7u7pd55xwdw-7511167.us-east-1.elb.amazonaws.D.Y.......,..4..Z.Y.......,....8Z
#
U 2020/03/04 16:28:17.363490 10.8.0.4:56468 -> 8.8.8.8:53 #15
nr...........match.amazonbrowserapp.de.....
#
U 2020/03/04 16:28:17.369720 8.8.8.8:53 -> 10.8.0.4:56468 #16
nr...........match.amazonbrowserapp.de..............)..6.
#
U 2020/03/04 16:28:18.024460 10.8.0.4:64589 -> 8.8.8.8:53 #17
.............identity.browserapps.amazon.de.....
#
U 2020/03/04 16:28:18.030664 8.8.8.8:53 -> 10.8.0.4:64589 #18
.............identity.browserapps.amazon.de................#.identity.browserapps.amazon.co.uk..<.......7..6.$.
#
U 2020/03/04 16:28:18.473433 10.8.0.4:49952 -> 8.8.8.8:53 #19
.............titan.service.amazonbrowserapp.co.uk.....
#
U 2020/03/04 16:28:18.479444 8.8.8.8:53 -> 10.8.0.4:49952 #20
.............titan.service.amazonbrowserapp.co.uk..............%..4^.o
exit
20 received, 20 matched
我想读这一行来处理它并发送给db
我想要像这样的东西
['2020/03/04', '16:28:01.138292', '10.8.0.4:52014', 'www.google.com']
或
['2020/03/04', '16:28:05.166580', '10.8.0.4:57284', '.{...........crm.teste.tk.....']
我知道参考(网站)有不同的行开始它可以是完整的行它对我来说是一样的。我只想尽我所能处理这些信息
我想使用python或bash脚本
我有以下脚本:
#!/usr/bin/python
import json
import MySQLdb
import os
import datetime
from shutil import copyfile
import time
# EXPORT EXPORT #
data = open('/etc/openvpn/logs/teste.txt', 'r')
data = data.read().split('\n')
all_results = []
result = []
for row in data:
if row.startswith('U '):
if result:
result = []
row = row.replace('U', '').split(' ')
result.append(row)
elif row.startswith('.|'):
row = row.replace('.|', '').replace('..', '')
result.append(row)
if result:
all_results.append(result)
result = []
data = json.dumps(all_results)
print data
此脚本的输出
[[["", "2020/03/04", "16:28:01.138292", "10.8.0.4:52014", "->", "8.8.8.8:53", "#1"], ".www.google.com."], [["", "2020/03/04", "16:28:01.146332", "8.8.8.8:53", "->", "10.8.0.4:52014", "#2"], ".www.google.com+"]]
我想更好地处理这个问题,并运行FOR来读取每个位置[x][0]
谢谢
编辑:
所有人都在处理那个文件。我运行NGREP几分钟,访问了随机网站,这是新的输出:
https://github.com/henriquemota99/Bugs/blob/master/output.rtf (忘记\在所有的行之后,github添加了那个,我不知道为什么)
然后我在帮助我的人的帮助下运行了这个惊人的python脚本
#!/usr/bin/python
import MySQLdb
import json
# EXPORT EXPORT #
data = open('/etc/openvpn/logs/teste.txt', 'r')
data = data.read().split('\n')
all_results = []
result = []
for row in data:
if row.startswith('U '):
if result:
result = []
row = row.replace('U', '').split(' ')
result.extend(row[1:4])
elif row.startswith('.'):
row = row.replace('.|', '').replace('..', '')
result.append(row.strip('.'))
if result:
all_results.append(result)
result = []
data = json.dumps(all_results)
print data
print all_results[0][0]
print all_results[0][1][ : all_results[0][1].rfind('.') ]
print all_results[0][2]
print all_results[0][3]
db = MySQLdb.connect(user="USER",passwd="PASSWORD",host="IP",db="DB")
cursor = db.cursor()
i = 0
for obj in all_results:
cursor.execute("INSERT INTO logsRequests (date, hour, userIp, referer) VALUES (%s, %s, %s, %s)", (all_results[i][0], all_results[i][1][ : all_results[i][1].rfind('.') ], all_results[i][2], all_results[i][3]))
i+=1
db.commit()
db.close()
如果我用第一个输出运行脚本,我得到的所有结果都很好,但这一个似乎不起作用
Traceback (most recent call last):
File "requests.py", line 46, in <module>
cursor.execute("INSERT INTO logsRequests (date, hour, userIp, referer) VALUES (%s, %s, %s, %s)", (all_results[i][0], all_results[i][1][ : all_results[i][1].rfind('.') ], all_results[i][2], all_results[i][3]))
IndexError: list index out of range
为什么??我看到有时文件上有空行,第三个位置是空的,这就是为什么它发送错误,但我删除了所有我猜。。如何应对
谢谢
修理你的
与
你可以考虑替换
与
我会这样做的——没有正则表达式
它依赖于一致的文件格式,如您的示例中所示。它依赖于以
'U'
开头的行正好位于与服务器名称相关的行之前。它只检索第二行中的第一个服务器地址您想要的信息在字典的值中
使用正则表达式
相关问题 更多 >
编程相关推荐