我正在处理一个数据集,程序应该返回一个32乘32的矩阵,但是它却返回一个1乘1的矩阵。这是我正在处理的数据集。click here
代码:
import pandas as pd
import numpy
import csv
def correlation(data, threshold):
dataset1=[]
tuples = 0
flag = False
with open(data, 'rt') as f:
reader = csv.reader(f)
for row in reader:
if flag == False:
if not row[0].isdigit():
flag = True
continue
tuples += 1
dataset1.append(row)
for i in range(len(dataset1)):
# Yes : 1 No : 0
if dataset1[i][-1] == 'Yes' or dataset1[i][-1] == '1' or dataset1[i][-1] == 'M' or dataset1[i][-1] == 'malignant':
dataset1[i][-1] = 1
elif dataset1[i][-1] == 'No'or dataset1[i][-1] == '0' or dataset1[i][-1] == 'B' or dataset1[i][-1] == 'benign':
dataset1[i][-1] = 0
col_corr = set() # Set of all the names of deleted columns
dataset=pd.DataFrame(dataset1)
corr_matrix=set()
dataset.astype(float)
print(dataset)
corr_matrix = dataset.corr()
print len(corr_matrix.columns)
print corr_matrix
for i in range(len(corr_matrix.columns)):
for j in range(i):
if corr_matrix.iloc[i, j] >= threshold:
colname = corr_matrix.columns[i] # getting the name of column
col_corr.add(colname)
if colname in dataset.columns:
del dataset[colname] # deleting the column from the dataset
print(dataset)
correlation("big.csv",0.7)
dataset.corr()
应该返回一个32乘32的矩阵,但是它将返回一个1乘1的矩阵作为输出。为什么?你知道吗
请注意,数据帧上的
corr
函数执行以下操作:您可能只有一列。你可以尝试转换你的数据。你知道吗
相关问题 更多 >
编程相关推荐