提取两个子字符串之间匹配的字符串部分

2024-09-29 23:21:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有三个文件包含一组字符串。File1和File2包含File3的子字符串。我想从File3中减去位于File1和File2子字符串之间的字符串。请看下面我的例子:

文件1(子字符串1):

 head(fivep$V2)
[1] UGAGGUAGUAGUUUGUACAGUU  UGAGGUAGUAGUUUGUGCUGUU  ACAUACUUCUUUAUAUGCCCAUA UAGCAGCACAUCAUGGUUUACA 
[5] GGGUUCCUGGCAUGCUGAUUU   AGAGCUUAGCUGAUUGGUGAAC 

文件2(子字符串2)

^{pr2}$

文件3

head(hairpin$V2)
[1] UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAACUAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA
[2] AUGCUUCCGGCCUGUUCCCUGAGACCUCAAGUGUGAGUGUACUAUUGAUGCUUCACACCUGGGCUCUCCGGGUACCAGGACGGUUUGAGCAGAU     
[3] AAAGUGACCGUACCGAGCUGCAUACUUCCUUACAUGCCCAUACUAUAUCAUAAAUGGAUAUGGAAUGUAAAGAAGUAUGUAGAACGGGGUGGUAGU   
[4] UAAACAGUAUACAGAAAGCCAUCAAAGCGGUGGUUGAUGUGUUGCAAAUUAUGACUUUCAUAUCACAGCCAGCUUUGAUGUGCUGCCUGUUGCACUGU 
[5] CGGACAAUGCUCGAGAGGCAGUGUGGUUAGCUGGUUGCAUAUUUCCUUGACAACGGCUACCUUCACUGCCACCCCGAACAUGUCGUCCAUCUUUGAA  
[6] UCUCGGAUCAGAUCGAGCCAUUGCUGGUUUCUUCCACAGUGGUACUUUCCAUUAGAACUAUCACCGGGUGGAAACUAGCAGUGGCUCGAUCUUUUCC  

示例:

                                 String in File1                       String in  File2
                              AGGGCUUAGCUGCUUGUGAGCA                   UUCACAGUGGCUAAGUUCCGC
String in File3      CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG

此示例的输出:

GGGUCCACACCAAGUCGUG

Tags: 文件字符串in示例stringfile1head例子
3条回答

在Perl中,您可以尝试以下代码:

use strict;
use warnings;

my $file1 = "AGGGCUUAGCUGCUUGUGAGCA";
my $file2 = "UUCACAGUGGCUAAGUUCCGC";
my $file3 = "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG";

my ($result) = $file3 =~ /$file1(.*?)$file2/;

print $result;

输出:

^{pr2}$

在R中使用qdapRegex

f1 <- "AGGGCUUAGCUGCUUGUGAGCA"
f2 <- "UUCACAGUGGCUAAGUUCCGC"
f3 <- "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG"

library(qdapRegex)
rm_between(f3, f1, f2, extract=TRUE)

## [[1]]
## [1] "GGGUCCACACCAAGUCGUG"

顾名思义,rm_between删除或获取左右边界之间的项。使用extract = TRUE获取边界之间的字符串。返回的值是一个列表,因为每个字符串可能有多个提取。如果这是不需要的,那么使用unlist,如unlist(rm_between(f3, f1, f2, extract=TRUE))。在

以下是R中的解决方案:

file1 <- "AGGGCUUAGCUGCUUGUGAGCA"
file2 <- "UUCACAGUGGCUAAGUUCCGC"
file3 <- "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG"

# create a regular expression
pattern <- paste0(".*", file1, "(.*)", file2, ".*")

# extract the substring
sub(pattern, "\\1", file3)
# [1] "GGGUCCACACCAAGUCGUG"

相关问题 更多 >

    热门问题