<p>一种可能的解决办法:</p>
<pre><code>dna_fun <- function(s, p, a) {
s <- strsplit(s, "")[[1]]
p <- strsplit(p, "")[[1]]
a <- strsplit(a, "")[[1]]
ls <- length(s)
lp <- length(p)
r <- lapply(c(1,seq(lp)), function(x) {
v <- rep(1, 5)
v[x] <- 2
v
})
mat <- sapply(r, rep, x = p)
tfm <- mat == matrix(rep(s, ls), ncol = ls)
m <- which.max(colSums(tfm))
p2 <- mat[, m]
p2[!tfm[,m]] <- "-"
a[!tfm[,m]] <- "-"
p2 <- paste(p2, collapse = "")
a <- paste(a, collapse = "")
return(list(p2, a))
}
</code></pre>
<p>与:</p>
<pre><code>dna_fun(s1, s2, annot)
</code></pre>
<p>你会得到:</p>
<blockquote>
<pre><code>[[1]]
[1] "AT-CAT"
[[2]]
[1] "13-198"
</code></pre>
</blockquote>
<hr/>
<p>如果有相应的向量,可以将<code>Map</code>与<code>dna_fun</code>-函数一起使用:</p>
<pre><code>s11 <- c("ATGCAT","ATCGAT")
s22 <- c("ATCAT","ATCAT")
annot2 <- c("135198","145892")
lm <- Map(dna_fun, s11, s22, annot2)
data.table::rbindlist(lm, idcol = "dna")
</code></pre>
<p>这使得:</p>
<blockquote>
<pre><code> dna V1 V2
1: ATGCAT AT-CAT 13-198
2: ATCGAT ATC-AT 145-92
</code></pre>
</blockquote>
<hr/>
<p>数据:</p>
<pre><code>s1 <- "ATGCAT"
s2 <- "ATCAT"
annot <- "135198"
</code></pre>