擅长:python、mysql、java
<p>首先可以按换行符拆分响应行。然后对于每一行:<code>protocol</code>、<code>packet</code>和{<cd3>}字段可以使用regex提取。然后附加一个dict列表(<code>lst_dict</code>)。最后将<code>lst_dict</code>转换为pandas数据帧。</p>
<pre><code>import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
import pandas as pd
import re
lst_dict = []
http = httplib2.Http()
status, response = http.request('http://mawi.wide.ad.jp/mawi/ditl/ditl2017/201704131545.html')
res = BeautifulSoup(response, parseOnlyThese=SoupStrainer('pre'))
items = res.text.split("\n")
for item in items[2:]:
item = item.strip()
protocol = re.search('(\w+)\s.*', item).group(1)
packet = re.search('\w+\s*(\w+)\s.*', item).group(1)
byts = re.search('\w+\s*\w+\s\(.*\)\s+(\w+)\s.*', item).group(1)
dict = {'protocol': protocol, 'packet': packet, 'bytes': byts}
lst_dict.append(dict)
df = pd.DataFrame(lst_dict)
print df
</code></pre>