用Matlab或Python在Tomek链接中运行一个数据集

2024-09-28 01:24:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个机器学习数据集“胸外科数据集”,我想用matlab或python语言在tomek链接中运行它。在

以下是数据集链接: http://archive.ics.uci.edu/ml/datasets/Thoracic+Surgery+Data

这有可能吗?!请帮帮我。。。在

敬上。在


Tags: 数据机器语言http链接mldatasetsics
1条回答
网友
1楼 · 发布于 2024-09-28 01:24:18

此链接提供代码和绘图详细信息,以便在Python中的数据集上应用Tomek链接 http://contrib.scikit-learn.org/imbalanced-learn/auto_examples/under-sampling/plot_tomek_links.html

import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle

from imblearn.under_sampling import TomekLinks

print(__doc__)

rng = np.random.RandomState(0)
n_samples_1 = 500
n_samples_2 = 50
X_syn = np.r_[1.5 * rng.randn(n_samples_1, 2),
              0.5 * rng.randn(n_samples_2, 2) + [2, 2]]
y_syn = np.array([0] * (n_samples_1) + [1] * (n_samples_2))
X_syn, y_syn = shuffle(X_syn, y_syn)
X_syn_train, X_syn_test, y_syn_train, y_syn_test = train_test_split(X_syn,
                                                                    y_syn)

# remove Tomek links
tl = TomekLinks(return_indices=True)
X_resampled, y_resampled, idx_resampled = tl.fit_sample(X_syn, y_syn)

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

idx_samples_removed = np.setdiff1d(np.arange(X_syn.shape[0]),
                                   idx_resampled)
idx_class_0 = y_resampled == 0
plt.scatter(X_resampled[idx_class_0, 0], X_resampled[idx_class_0, 1],
            alpha=.8, label='Class #0')
plt.scatter(X_resampled[~idx_class_0, 0], X_resampled[~idx_class_0, 1],
            alpha=.8, label='Class #1')
plt.scatter(X_syn[idx_samples_removed, 0], X_syn[idx_samples_removed, 1],
            alpha=.8, label='Removed samples')

相关问题 更多 >

    热门问题