java在使用BufferedReader读取时生成多行子字符串

2 周，6 日 Questions & Answers 199

我在读包含DNA序列的大文件。这些字符是很长的一段，我需要文件中某个地方的某个子集（我有开始和停止位置）。由于文件太大，我使用BufferedReader()来读取。这一次读取一行，但我想要的子集可能位于多行上。只有当整个DNA序列被表示为一条没有新行的直线时，我所知道的起始和终止位置才有意义。实际上，不幸的是，这些文件确实包含换行符。因此，对于每一行，指数从0到结尾，而不是0到20，21到40，41到60等等

我的问题：逐行读取文件，但保存可能跨多行的子集/子字符串。我尝试了几种方法，但无法提取所需的子字符串。我怀疑我自己的逻辑/思维有缺陷，或者有一种方法我还不知道。有更好的方法吗

方法1:

public String getSubSequence() {


        fileLocation = "genome.fna";
        String referenceGenomeSub = "";
        int passedLetters = 0;
        int passedLines = 0;

        //start- and stop position
        int start = 50;
        int stop = 245;

        Path path = Paths.get(fileLocation);



        try (BufferedReader br = Files.newBufferedReader(path, Charset.defaultCharset())){

            String line;

            while ((line = br.readLine()) != null) {

                if (!line.startsWith(">")) {//Don't need lines that start with >

                    passedLines++;

                    //edit indices so I don't get out of bounds
                    if (linesPassed != 1) {
                        start = start - passedLetters;
                        stop = stop - passedLetters;
                    }


                    //this is to know where I am in the file
                    passedLetters = passedLetters + line.length();


                    //if the subset is on only one line
                    if (start < passedLetters && stop <= passedLetters) {                        
                        referenceGenomeSub = referenceGenomeSub.concat(line);                        
                    }


                    //subsequence is on multiple lines
                    else if (start <= passedLetters && stop > passedLetters) {
                        referenceGenomeSub = line.substring(start);
                    }
                    else if (passedLetters > stop && !referenceGenomeSub.isEmpty()) {
                        referenceGenomeSub = referenceGenomeSub.concat(line.substring(0, stop));
                    }

                }

            }
            br.close();

        } catch (IOException e) {
            System.out.println("Error: " + e.getMessage());
        }

    }
}

在这里，我试图记录我已经通过的字符数。这就是我如何知道何时在所需子字符串的范围内
结果：StringIndexOutOfBoundsException

我的另一种方法是保存所有行，直到行与我的停止位置。然后提取一个子字符串。这不是首选项，因为我的esired子集可能位于文件末尾

条件：
-内存友好型
-如果可能的话，没有BioJava。我仍在学习编程，所以我想在没有编程的情况下完成这项工作。即使这是一条艰难的路

不寻找固定的代码，但一些文章/示例可以让我走上正确的轨道。我现在盯着屏幕看了好几个小时，没有取得任何进展，我的大脑现在有点空白。正如我所说的，问题可能是思维有缺陷，或者忘记了更好的方法/技巧

Python中文网

有 Java 编程相关的问题?

java在使用BufferedReader读取时生成多行子字符串

共 (0) 个答案