擅长:python、mysql、java
<p>您可以使用<code>StratifiedShuffleSplit</code>(或者<code>StratifiedKFold</code>如果您不想洗牌,但是您需要进行5次拆分以获得80%/20%的训练/测试拆分,因为您无法通过其他方式控制测试大小。)类在scikit学习:</p>
<pre><code>import sklearn.model_selection
import numpy as np
# Array similar to your structure
x = np.asarray([[0,4136,1],[0,5553,1],[0,9089,1],[1,0,2], \
[1,224,1],[1,226,1],[1,324,2],[1,341,1],[1,530,1]])
# Get train and test indices using x[:,0] to define the 'classes'
cv = sklearn.model_selection.StratifiedShuffleSplit(n_splits=1, test_size=0.2)
# Note, X isn't actually used in the method, np.zeros(n_samples) would also work
# Also note that cv.split is an iterator with 1 element (split),
# hence getting the first element of the list
train_idx, test_idx = list(cv.split(X=x, y=x[:,0]))[0]
print("Training")
for i in train_idx:
print(x[i,:2], x[i,2])
print("Test")
for i in test_idx:
print(x[i,:2], x[i,2])
</code></pre>
<p>我对稀疏矩阵没有太多的经验,所以我希望您可以根据我的示例进行必要的调整</p>