iloc函数在iris数据集中的作用是什么？

X = df.iloc[0:100, **[0,1]**].values plt.scatter(**X[:50, 0], X[:50, 1]**,alpha=0.5, c='b', edgecolors='none', label='setosa %2s'%(y[0])) plt.scatter(**X[50:100, 0], X[50:100, 1]**,alpha=0.5, c='r', edgecolors='none', label='versicolor %2s'%(y[50]))

%matplotlib inline import matplotlib.pyplot as plt import pandas as pd import numpy as np #from sklearn import cross_validation from sklearn.model_selection import train_test_split from sklearn import preprocessing from mlclass2 import simplemetrics, plot_decision_2d_lda df = pd.read_csv('https://archive.ics.uci.edu/ml/' 'machine-learning-databases/iris/iris.data', header=None) X = df.iloc[0:100, **[0,1]**].values y = df.iloc[0:100, 4].values y = np.where(y == 'Iris-setosa', 0, 1) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=5) stdscaler = preprocessing.StandardScaler().fit(X_train) X_scaled = stdscaler.transform(X) X_train_scaled = stdscaler.transform(X_train) X_test_scaled = stdscaler.transform(X_test) # plot data plt.scatter(X[:50, 0], X[:50, 1],alpha=0.5, c='b', edgecolors='none', label='setosa %2s'%(y[0])) plt.scatter(X[50:100, 0], X[50:100, 1],alpha=0.5, c='r', edgecolors='none', label='versicolor %2s'%(y[50])) plt.xlabel('sepal length [cm]') plt.ylabel('petal length [cm]') plt.legend(loc='lower right') plt.show()

1条回答

网友

1楼 · 发布于 2024-09-30 14:26:02

.values仅返回删除了轴标签的数据帧的值

.iloc使用基于整数位置的索引

代码的.iloc部分表示，对于自变量，我们只需要第0列和第1列的前100行，对于因变量，只需要第4行的前100行。如果这一部分仍然令人困惑，我建议您研究切片表示法。很快，将.iloc上的切片表示法简化为.iloc[start:stop]

原始数据帧：

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import preprocessing


df = pd.read_csv('https://archive.ics.uci.edu/ml/'
        'machine-learning-databases/iris/iris.data', header=None)
X = df.iloc[0:100, [0,1]].values
y = df.iloc[0:100, 4].values
y = np.where(y == 'Iris-setosa', 0, 1)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=5)

stdscaler = preprocessing.StandardScaler().fit(X_train)
X_scaled  = stdscaler.transform(X)
X_train_scaled = stdscaler.transform(X_train)
X_test_scaled  = stdscaler.transform(X_test)

print(df)

输出：

       0    1    2    3               4
0    5.1  3.5  1.4  0.2     Iris-setosa
1    4.9  3.0  1.4  0.2     Iris-setosa
2    4.7  3.2  1.3  0.2     Iris-setosa
3    4.6  3.1  1.5  0.2     Iris-setosa
4    5.0  3.6  1.4  0.2     Iris-setosa
..   ...  ...  ...  ...             ...
145  6.7  3.0  5.2  2.3  Iris-virginica
146  6.3  2.5  5.0  1.9  Iris-virginica
147  6.5  3.0  5.2  2.0  Iris-virginica
148  6.2  3.4  5.4  2.3  Iris-virginica
149  5.9  3.0  5.1  1.8  Iris-virginica

[150 rows x 5 columns]

iloc[0:100，[0,1]].values-了解我们如何在这里只返回第0列和第1列吗？从索引值0开始，到100结束，[开始：停止]。我们只选择第0列和第1列，因为要清除[0,1]

[[5.1 3.5]
 [4.9 3. ]
 [4.7 3.2]
 [4.6 3.1]
 [5.  3.6]
 [5.4 3.9]
 [4.6 3.4]
 [5.  3.4]
 [4.4 2.9]
 [4.9 3.1]
 [5.4 3.7]
 [4.8 3.4]
 [4.8 3. ]
 [4.3 3. ]
 [5.8 4. ]
 [5.7 4.4]
 [5.4 3.9]
 [5.1 3.5]

df.iloc[0:100，4]。值-与上面相同，但仅选择第4列

['Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa']

相关问题更多 >

编程相关推荐

热门问题

热门文章