<p>这不是标准格式,因此您需要在此处使用自定义解析器</p>
<p>这应该适用于任何此类文件</p>
<pre><code>import requests
import re
import pandas as pd
url = 'http://www.jmulti.de/download/datasets/e6.dat'
data = requests.get(url).text
# to match the header
pattern = '/\*[\s\S]*\*/'
# removes the header content from data.
data = re.sub(pattern, '', data)
# data is a single string with newlines escaped in it.
# So splitting would make it iterable
data = data.split('\n')
# there might be some blank lines so we will discard them
data = [x for x in data if x != '']
# remove the shifts(<>) from Quarter info line so that '<1971 Q1>' becomes '1971 Q1'
# however it can be done with regex as well.
quarterinfo = data[0].replace('<', '').replace('>', '')
year, quarter = quarterinfo.split()
df_data = []
for line in data[2:]:
# removing the newlines if there are any
# and blank spaces
line = line.replace('\n', '').strip()
# converting the values for each data row.
# Leaving it as is would make the df values str.
dp, r = [float(x) for x in line.split()]
df_data.append({
'year': year,
'quarter': quarter,
'DP': dp,
'R': r
})
df = pd.DataFrame(df_data)
print(df.head())
</code></pre>
<p>运行输出-></p>
<pre><code> year quarter DP R
0 1972 Q2 -0.003133 0.083
1 1972 Q2 0.018871 0.083
2 1972 Q2 0.024804 0.087
3 1972 Q2 0.016278 0.087
4 1972 Q2 0.000290 0.102
</code></pre>