数据帧将行拆分为2

2024-09-19 21:00:38 发布

您现在位置:Python中文网/ 问答频道 /正文

在使用BedTools的覆盖率后,我正在生成数据结果的csv文件。但是,最终的数据帧将数据分成两行,而不是按原样保留一行。我曾尝试使用空格、逗号或制表符作为分隔符,但仍然没有将其保留为一行,也没有将其拆分为所需的床格式。任何帮助都将不胜感激

输入IMR和hESC文件如下所示:

  track name=IMR90 description=IMR90 color=0,0,0
  chr1  226253377   226573378   IMR90b_208
  chr1  243133377   243333378   IMR90b_226
  chr1  162493376   162533377   IMR90b_145
  chr1  230533377   230773378   IMR90b_213
  chr1  3610140 3770141 IMR90b_4
  chr1  6077413 6277414 IMR90b_5

循环输入文件如下所示:

chr11   111240000   111280000   GM12878_replicate
chr14   24810000    24900000    GM12878_replicate
chr1    203250000   203290000   GM12878_replicate
chr12   50040000    50100000    GM12878_replicate
chr1    46510000    46640000    GM12878_replicate
chr1    23880000    23960000    GM12878_replicate
chr12   108970000   109010000   GM12878_replicate
chr8    11280000    11320000    GM12878_replicate

我的python代码:

from pybedtools import BedTool

#Read sorted IMR90 tad file
IMR90_tad = BedTool('IMR90_hg19_FINAL_W.txt').sort() # read in IMR90 tads

#Read sorted IMR90 tad file
hESC_tad = BedTool('hESC_hg19_FINAL_W.txt').sort() # read in hESC tads

#Read sorted loops file
loops = BedTool('all_loops_chr_.txt').sort()

#calculate coverage
coverage_IMR90_tad_cons = loops.coverage(IMR90_tad)
coverage_hESC_tad_cons = loops.coverage(hESC_tad)

# save as data-frames:
coverage_IMR90_tad_cons.to_dataframe().to_csv('coverage_IMR90_tad_cons', sep='\t')
coverage_hESC_tad_cons.to_dataframe().to_csv('coverage_hESC_tad_cons', sep='\t')

coverage_IMR90_tad_cons.to_dataframe().to_csv('cov_IMR90_tad_cons', sep='')
coverage_hESC_tad_cons.to_dataframe().to_csv('cov_hESC_tad_cons', sep='')

csv文件的外观:

    chrom   start   end name    score   strand  thickStart  thickEnd
0   chr1    145048643   145368644   hESCb_192               
1       23  3632    320001  0.01135         
2   chr1    157013376   157093377   hESCb_207               
3       10  1902    80001   0.0237747   

Tags: 文件csvtodataframecoveragechr1replicateloops