擅长:python、mysql、java
<p>专门处理这种非标准和非统一文件格式的高效库存在的可能性很小。因此,我将逐行手动解析这个文件到<code>list of dicts</code>中,其中缺少的键(列)可以由<code>DataFrame()</code>构造函数处理</p>
<p>代码:</p>
<pre><code>path_to_file = "/mnt/ramdisk/in.txt"
ls_dic = []
with open(path_to_file) as f:
for line in f:
ls = line.split(",")
dic = {}
dic["Number"] = ls[0]
for k_v in ls[1:]:
k, v = k_v.split("=")
dic[k.capitalize()] = v.strip()
ls_dic.append(dic)
df = pd.DataFrame(ls_dic)
</code></pre>
<p>结果:</p>
<pre><code>print(df)
Number Name Car Price Bike
0 1 Messi ford 234 Harley
1 2 Cavani mazda 58 Ducatti
2 3 Dembele toyota NaN Yamaha
3 4 kevin Ford 989 NaN
4 5 Aguero NaN NaN Ducatti
5 6 nadal Ferrari NaN Harley
</code></pre>