<p>如Warren Weckesser的回答所示,scipy无法读取稀疏arff文件。我已经实现了一个快速的解决方法来解析稀疏的arff文件,如果它能帮助其他人,我将在下面与大家分享。
如果我有时间做一个干净的版本,我会努力为scipy版本做贡献。你知道吗</p>
<p>编辑:对不起,我没有看到你的版本,但我想它也可以。你知道吗</p>
<pre class="lang-py prettyprint-override"><code>from scipy.sparse import coo_matrix
from functools import reduce
import pandas as pd
def loadarff(filename):
features = list()
data = list()
row_idx = 0
with open(filename, "rb") as f:
for line in f:
line = line.decode("utf8")
if line.startswith("@data"):
continue
elif line.startswith("@relation"):
continue
elif line.startswith("@attribute"):
try:
features.append(line.split(" ")[1])
except Exception as e:
print(f"Cannot parse {line}")
raise e
elif line.startswith("{"):
try:
line = line.replace("{", "").replace("}", "")
line = [[row_idx,]+[int(x) for x in v.split(" ")] for v in line.split(",")]
data.append(line)
row_idx += 1
except Exception as e:
print(f"Cannot parse {line}")
raise e
else:
print(f"Cannot parse {line}")
flatten = lambda l: [item for sublist in l for item in sublist]
data = flatten(data)
sparse_matrix = coo_matrix(([x[2] for x in data], ([x[0] for x in data], [x[1] for x in data])), shape=(row_idx, len(features)))
df = pd.DataFrame(sparse_matrix.todense(), columns=features)
return df
</code></pre>