KNeighborsClassifier是否会比较不同大小的列表？

1条回答

网友

1楼 · 发布于 2024-09-29 21:45:05

据我所知，有两个（或一个…）选项：

预先计算距离（不直接受KNeighborsClassifier似乎，其他的聚类算法都有，例如Spectral Clustering）。在
使用NaNs将数据转换为正方形，并在自定义距离函数中相应地处理这些数据。在

使用`NaN`s“平方”您的数据

所以，选择2就是。假设我们有以下数据，其中每一行代表一个时间序列：

import numpy as np

series = [
    [1,2,3,4],
    [1,2,3],
    [1],
    [1,2,3,4,5,6,7,8]
]

我们只需通过添加nan使数据为正方形：

^{pr2}$

现在数据“适合”到算法中。您只需调整距离函数来考虑NaNs

预计算并使用缓存函数

哦，我们也可以选择1（假设你有N时间序列）：

将距离预计算成(N, N)距离矩阵D
创建一个(N, 1)矩阵，它只是[0, N)之间的一个范围（即，距离矩阵中序列的索引）
创建距离函数wrapper
使用这个wrapper作为距离函数。在

wrapper函数：

def wrapper(row1, row2):
    # might have to fiddle a bit here, but i think this retrieves the indices.
    i1, i2 = row1[0], row2[0]
    return D[i1, i2]

好吧，希望一切都清楚。在

完整示例

#!/usr/bin/env python2.7
# encoding: utf-8
'''
'''
from mlpy import dtw_std # I dont know if you are using this one: it doesnt matter.
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

# Example data
series = [
    [1, 2, 3, 4],
    [1, 2, 3, 4],
    [1, 2, 3, 4],
    [1, 2, 3],

    [1],

    [1, 2, 3, 4, 5, 6, 7, 8],
    [1, 2, 5, 6, 7, 8],
    [1, 2, 4, 5, 6, 7, 8],
]

# I dont know.. these seemed to make sense to me!
y = np.array([
    0,
    0,
    0,
    0,

    1,

    2,
    2,
    2
])

# Compute the distance matrix
N = len(series)
D = np.zeros((N, N))

for i in range(N):
    for j in range(i+1, N):
        D[i, j] = dtw_std(series[i], series[j])
        D[j, i] = D[i, j]

print D

# Create the fake data matrix: just the indices of the timeseries
X = np.arange(N).reshape((N, 1))


# Create the wrapper function that returns the correct distance
def wrapper(row1, row2):
    # cast to int to prevent warnings: sklearn converts our integer indices to floats.
    i1, i2 = int(row1[0]), int(row2[0])
    return D[i1, i2]

# Only the ball_tree algorith seems to accept a custom function
knn = KNeighborsClassifier(weights='distance', algorithm='ball_tree', metric='pyfunc', func=wrapper)
knn.fit(X, y)
print knn.kneighbors(X[0])
# (array([[ 0.,  0.,  0.,  1.,  6.]]), array([[1, 2, 0, 3, 4]]))
print knn.kneighbors(X[0])
# (array([[ 0.,  0.,  0.,  1.,  6.]]), array([[1, 2, 0, 3, 4]]))

print knn.predict(X)
# [0 0 0 0 1 2 2 2]

使用`NaN`s“平方”您的数据

预计算并使用缓存函数

完整示例

相关问题更多 >

编程相关推荐

热门问题

热门文章

KNeighborsClassifier是否会比较不同大小的列表？

使用NaNs“平方”您的数据

预计算并使用缓存函数

完整示例

相关问题 更多 >

编程相关推荐

热门问题

热门文章

使用`NaN`s“平方”您的数据

相关问题更多 >