评估重复样本在其字段中是否有不同的数据，以及是否复制数据？问题的回答

评估重复样本在其字段中是否有不同的数据，以及是否复制数据？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我想评估一个样本及其复制品（以2结尾）是否在他们的年龄、家族史和诊断字段中输入了数据。如果一个示例有条目，而它的副本没有（所有的“-”条目），那么我想将条目从示例复制到duplicates字段。评估应该以另一种方式进行：如果副本有条目，而示例没有条目，则将它们复制到示例字段。你知道吗 基本上，我希望输入的_-df看起来像所需的_-df（如下所示）。你知道吗 <pre><code>input_df = pd.DataFrame(columns=['Sample', 'Date','Age', 'Family History', 'Diagnosis'], data=[ ['HG_12_34', '12/3/12', '23', 'Y', 'Jerusalem Syndrome'], ['LG_3_45', '3/4/12', '45', 'N', 'Paris Syndrome'], ['HG_12_34_2', '4/5/13', '-', '-', '-'], ['KD_89_9', '8/9/12', '-', '-', '-'], ['KD_98_9_2', '6/1/13', '54', 'Y', 'Chronic Hiccups'], ['LG_3_45_2', '4/4/10', '59', 'N', 'Dangerous Sneezing Syndrome'] ]) desired_df = pd.DataFrame(columns=['Sample', 'Date','Age', 'Family History', 'Diagnosis'], data=[ ['HG_12_34', '12/3/12', '23', 'Y', 'Jerusalem Syndrome'], ['LG_3_45', '3/4/12', '45', 'N', 'Paris Syndrome'], ['HG_12_34_2', '4/5/13', '23', 'Y', 'Jerusalem Syndrome'], ['KD_89_9', '8/9/12', '54', 'Y', 'Chronic Hiccups'], ['KD_98_9_2', '6/1/13', '54', 'Y', 'Chronic Hiccups'], ['LG_3_45_2', '4/4/10', '59', 'N', 'Dangerous Sneezing Syndrome'] ]) </code></pre> 下面详细介绍了我在这方面真正低效和不完整的尝试： <pre><code>def testing(duplicate, df): ''' Checking difference in phenotype data between duplicates and return the sample name if ''' # only assess the duplicate if duplicate['Sample'][:-2] in list(df['Sample'].unique()): # get sam row sam = df[df['Sample'] == duplicate['Sample'][:-2]] # store the Age, Family History and Diagnosis in a list for each sample sam_pheno = sam.iloc[0][2:4].fillna("-").tolist() duplicate_pheno = duplicate[2:4].fillna("-").tolist() # if the duplicate sample has nothing in these fields then return the # orginal sample name if len(set(duplicate_pheno)) == 1 and list(set(duplicate_pheno))[0] == "-" \ and len(set(sam_pheno)) > 1: return duplicate['Sample'][:-2] # this creates a column called Pheno which has the name of the sample which contains the phenotype data that they should share. This is intended so that I can somehow copy over the phenotype data from the sample name in the Pheno field. However, I have no idea how to do this. input_df['Pheno'] = input_df.apply(lambda x: testing(x, input_df), axis =1) </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

评估重复样本在其字段中是否有不同的数据，以及是否复制数据？

1 个回答

相关Python问题