与相关性相关的python编程

2024-09-27 20:20:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在处理一个数据集,程序应该返回一个32乘32的矩阵,但是它却返回一个1乘1的矩阵。这是我正在处理的数据集。click here

代码:

import pandas as pd
import numpy
import csv
def correlation(data, threshold):
    dataset1=[]
    tuples = 0
    flag = False
    with open(data, 'rt') as f:
    reader = csv.reader(f)
    for row in reader:
        if flag == False:
        if not row[0].isdigit():
            flag = True
            continue
        tuples += 1
        dataset1.append(row)

    for i in range(len(dataset1)):
            # Yes : 1 No : 0
    if dataset1[i][-1] == 'Yes' or dataset1[i][-1] == '1' or dataset1[i][-1] == 'M' or dataset1[i][-1] == 'malignant':
        dataset1[i][-1] = 1
    elif dataset1[i][-1] == 'No'or dataset1[i][-1] == '0' or dataset1[i][-1] == 'B' or dataset1[i][-1] == 'benign':
        dataset1[i][-1] = 0
    col_corr = set() # Set of all the names of deleted columns
    dataset=pd.DataFrame(dataset1)

    corr_matrix=set()
    dataset.astype(float)
    print(dataset)
    corr_matrix = dataset.corr()
    print len(corr_matrix.columns)
    print corr_matrix
    for i in range(len(corr_matrix.columns)):
        for j in range(i):
            if corr_matrix.iloc[i, j] >= threshold:
                colname = corr_matrix.columns[i] # getting the name of column
                col_corr.add(colname)
                if colname in dataset.columns:
                    del dataset[colname] # deleting the column from the dataset

    print(dataset)

correlation("big.csv",0.7)

dataset.corr()应该返回一个32乘32的矩阵,但是它将返回一个1乘1的矩阵作为输出。为什么?你知道吗


Tags: orcolumnscsvtheinimportforif

热门问题