Python数据帧集的名称长度问题需要一个参数,但给出了两个参数

2024-09-27 07:20:44 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我有这个prep_dat函数,我给它以下csv数据:

identifier,Hugo_Symbol,Tumor_Sample_Barcode,Variant_Classification,patient
1,patient,a,Silent,6
22,mutated,d,e,7
1,Hugo_Symbol,f,g,88

在这个prep_data函数中,有一行

  gene_mutation_df.index.set_names(['Hugo_Symbol', 'patient'], inplace=True)

然而,当它说到台词时,它总是出错

ValueError: Length of new names must be 1, got 2

是线路有问题还是功能有问题

这是全部源代码

import pandas as pd
import numpy as np
PRIMARY_TUMOR_PATIENT_ID_REGEX = '^.{4}-.{2}-.{4}-01.*'
SHORTEN_PATIENT_REGEX = '^(.{4}-.{2}-.{4}).*'

def mutations_for_gene(df):
  mutated_patients = df['identifier'].unique()
  return pd.DataFrame({'mutated': np.ones(len(mutated_patients))}, index=mutated_patients)
  
def prep_data(mutation_path):
  df = pd.read_csv(mutation_path, low_memory=True, dtype=str, header = 0)#Line 24 reads in a line memory csv file from the given path and parses it based on '\t' delimators, and casts the data to str
  
  df = df[~df['Hugo_Symbol'].str.contains('Hugo_Symbol')] #analyzes the 'Hugo_Symbol' heading within the data and makes a new dataframe where any row that contains 'Hugo_Symbol' is dropped

  df['Hugo_Symbol'] = '\'' + df['Hugo_Symbol'].astype(str) # Appends ''\'' to all the data remaining in that column
  
  df['Tumor_Sample_Barcode'] = df['Tumor_Sample_Barcode'].str.strip() #strips away whitespace from the data within this heading
  non_silent = df.where(df['Variant_Classification'] != 'Silent') #creates a new dataframe where the data within the column 'Variant_Classification' is not equal to 'Silent'

  df = non_silent.dropna(subset=['Variant_Classification']) #Drops all the rows that are missing at least one element
  
  non_01_barcodes = df[~df['Tumor_Sample_Barcode'].str.contains(PRIMARY_TUMOR_PATIENT_ID_REGEX)]['Tumor_Sample_Barcode'] #Creates a new dataframe of all the data within the 'Tumor_Sample_Barcode' column that does not match the PRIMARY_TUMOR_PATIENT_ID_REGEX
  #TODO: Double check that the extra ['Tumor_Sample_Barcode'] serves no purpose
  df = df.drop(non_01_barcodes.index)
  print(df)
  shortened_patients = df['Tumor_Sample_Barcode'].str.extract(SHORTEN_PATIENT_REGEX, expand=False)
  df['identifier'] = shortened_patients
  
  gene_mutation_df = df.groupby(['Hugo_Symbol']).apply(mutations_for_gene)
  gene_mutation_df.index.set_names(['Hugo_Symbol', 'patient'], inplace=True)
  gene_mutation_df = gene_mutation_df.reset_index()
  gene_patient_mutations = gene_mutation_df.pivot(index='Hugo_Symbol', columns='patient', values='mutated')
  
  return gene_patient_mutations.transpose().fillna(0)

任何帮助都将不胜感激(我知道这并不具体,我仍在努力弄清楚这个函数到底做了什么,以及如何生成数据来测试它)


Tags: thesampledfdataindexsymbolbarcodegene

热门问题