获取限制位点附近300碱基的序列

pos = read.csv(file="sites.csv",header=F,sep="\t") fastq = read.csv(file="reads.sam", header=F,sep="\t") newFastq = data.frame(fastq) newFastq = NULL trim <- function (x) gsub("^\\s+|\\s+$", "", x) for(i in 1:nrow(fastq)){ for(j in 1:nrow(pos)){ if(as.character(fastq[i,3]) == trim(as.character(pos[j,1]))){ if(fastq[i,4] - pos[j,2] < 300 && fastq[i,4] - pos[j,2] > -300){ newFastq = rbind(newFastq,fastq[i,]) } } } } #Write data into file write.table(newFastq, file = "sitesFound.csv",row.names=FALSE, na="",quote=FALSE,col.names=FALSE, sep="\t")

1条回答

网友

1楼 · 发布于 2024-09-30 22:12:39

一种总体策略是使用Bioconductor RsamtoolsasBam()和indexBam()生成索引的bam文件。将第一个文件读入数据框并构造一个GenomicRangesGRanges()对象。最后，使用GenomicAlignmentsreadGAlignments()读取bam文件，使用GRanges()作为ScanBamParam()的which=参数。如果您决定走这条路线，那么Bioconductor支持站点https://support.bioconductor.org更适合回答Bioconductor问题。你知道吗

看起来您希望读取的值在GRanges对象的+/-300碱基对内。调整GRanges的大小

library(GenomicRanges)
## create gr = GRanges(...)
gr = resize(gr, width = 600, fix="center")

将其用作ScanBamParam()中的which=，然后读取BAM文件

library(GenomicAlignments)
param = ScanBamParam(which = gr)
reads = readGAlignments("your.bam", param = param)

使用what=控制从BAM文件读取的字段，例如

param = ScanBamParam(which = gr, what = "seq")

相关问题更多 >

编程相关推荐

热门问题

热门文章