有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java Lucene可以返回带有行号的搜索结果吗?

我想实现“在文件中查找”,类似于使用lucene的IDE中的查找。基本上希望在源代码文件中搜索,如。Ccpp,。H政务司司长及。xml。我尝试了apache网站上的演示。它返回文件列表,其中没有行号和发生次数。我相信应该有办法得到它

有没有办法得到这些细节


共 (2) 个答案

  1. # 1 楼答案

    你能分享apache网站上演示的链接吗

    在这里,我将向您展示如何获取给定文档集的术语频率:

    public static void main(final String[] args) throws CorruptIndexException,
                LockObtainFailedException, IOException {
    
            // Create the index
            final Directory directory = new RAMDirectory();
            final Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
            final IndexWriterConfig config = new IndexWriterConfig(
                    Version.LUCENE_36, analyzer);
            final IndexWriter writer = new IndexWriter(directory, config);
    
            // addDoc(writer, field, text);
            addDoc(writer, "title", "foo");
            addDoc(writer, "title", "buz qux");
            addDoc(writer, "title", "foo foo bar");
    
            // Search
            final IndexReader reader = IndexReader.open(writer, false);
            final IndexSearcher searcher = new IndexSearcher(reader);
    
            final Term term = new Term("title", "foo");
            final Query query = new TermQuery(term);
            System.out.println("Query: " + query.toString() + "\n");
    
            final int limitShow = 3;
            final TopDocs td = searcher.search(query, limitShow);
            final ScoreDoc[] hits = td.scoreDocs;
    
            // Take IDs and frequencies
            final int[] docIDs = new int[td.totalHits];
            for (int i = 0; i < td.totalHits; i++) {
                docIDs[i] = hits[i].doc;
            }
            final Map<Integer, Integer> id2freq = getFrequencies(reader, term,
                    docIDs);
    
            // Show results
            for (int i = 0; i < td.totalHits; i++) {
                final int docNum = hits[i].doc;
                final Document doc = searcher.doc(docNum);
                System.out.println("\tposition " + i);
                System.out.println("Title: " + doc.get("title"));
                final int freq = id2freq.get(docNum);
                System.out.println("Occurrences of \"" + term.text() + "\" in \""
                        + term.field() + "\" = " + freq);
                System.out.println("                \n");
            }
            searcher.close();
            reader.close();
            writer.close();
        }
    

    在此,我们将文档添加到索引中:

    private static void addDoc(final IndexWriter w, final String field,
                final String text) throws CorruptIndexException, IOException {
            final Document doc = new Document();
            doc.add(new Field(field, text, Field.Store.YES, Field.Index.ANALYZED));
            doc.add(new Field(field, text, Field.Store.YES, Field.Index.ANALYZED));
            w.addDocument(doc);
    }
    

    这是一个如何获取文档中某个术语的出现次数的示例:

    public static Map<Integer, Integer> getFrequencies(
            final IndexReader reader, final Term term, final int[] docIDs)
            throws CorruptIndexException, IOException {
        final Map<Integer, Integer> id2freq = new HashMap<Integer, Integer>();
        final TermDocs tds = reader.termDocs(term);
        if (tds != null) {
            for (final int docID : docIDs) {
                // Skip to the next docID
                tds.skipTo(docID);
                // Get its term frequency
                id2freq.put(docID, tds.freq());
            }
        }
        return id2freq;
    }
    

    如果您将所有内容放在一起并运行它,您将获得以下输出:

    Query: title:foo
    
        position 0
    Title: foo
    Occurrences of "foo" in "title" = 2
                    
    
        position 1
    Title: foo foo bar
    Occurrences of "foo" in "title" = 4
                    
    
  2. # 2 楼答案

    我试过很多论坛,反应都是零。所以最后我从@Luca Mastrostefano那里得到了一个主意,没有答案来获取行号的详细信息

    lucene searcher的Taginfo返回文件名。我认为这足以得到行号。Lucene索引并不是存储实际内容,而是存储散列值。因此,不可能直接获得行号。因此,我假设唯一的方法是使用该路径并读取文件并获取行号

    public static void PrintLines(string filepath,string key)
        {
            int counter = 1;
            string line;
    
            // Read the file and display it line by line.
            System.IO.StreamReader file = new System.IO.StreamReader(filepath);
            while ((line = file.ReadLine()) != null)
            {
                if (line.Contains(key))
                {
                    Console.WriteLine("\t"+counter.ToString() + ": " + line);
                }
                counter++;
            }
            file.Close();
        }
    

    在lucene searcher的路径之后调用此函数