多线程Java多线程解析器

1 月 Questions & Answers 3055

我正在编写一个多线程解析器。解析器类如下所示

public class Parser extends HTMLEditorKit.ParserCallback implements Runnable {

    private static List<Station> itemList = Collections.synchronizedList(new ArrayList<Item>());
    private boolean h2Tag = false;
    private int count;
    private static int threadCount = 0;

    public static List<Item> parse() {
        for (int i = 1; i <= 1000; i++) { //1000 of the same type of pages that need to parse

            while (threadCount == 20) { //limit the number of simultaneous threads
                try {
                    Thread.sleep(50);
                } catch (InterruptedException ex) {
                    ex.printStackTrace();
                }
            }

            Thread thread = new Thread(new Parser());
            thread.setName(Integer.toString(i));
            threadCount++; //increase the number of working threads
            thread.start();            
        }

        return itemList;
    }

    public void run() {
        //Here is a piece of code responsible for creating links based on
        //the thread name and passed as a parameter remained i,
        //connection, start parsing, etc.        
        //In general, nothing special. Therefore, I won't paste it here.

        threadCount--; //reduce the number of running threads when current stops
    }

    private static void addItem(Item item) {
        itenList.add(item);
    }

    //This method retrieves the necessary information after the H2 tag is detected
    @Override
    public void handleText(char[] data, int pos) {
        if (h2Tag) {
            String itemName = new String(data).trim();

        //Item - the item on which we receive information from a Web page
        Item item = new Item();
        item.setName(itemName);
        item.setId(count);
        addItem(item);

        //Display information about an item in the console
        System.out.println(count + " = " + itemName); 
        }
    }

    @Override
    public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
        if (HTML.Tag.H2 == t) {
            h2Tag = true;
        }
    }

    @Override
    public void handleEndTag(HTML.Tag t, int pos) {
        if (HTML.Tag.H2 == t) {
            h2Tag = false;
        }
    }
}

从另一个类运行解析器，如下所示：

List<Item> list = Parser.parse();

一切都很好，但有一个问题。在最终列表的解析结束时，“list itemList”包含980个元素，而不是1000个。但在控制台中有全部1000个元素（项目）。也就是说，一些线程出于某种原因没有在HandletText方法中调用addItem方法

我已经尝试将itemList的类型更改为ArrayList、CopyOnWriteArrayList和Vector。使方法addItem同步，更改其对同步块的调用。所有这些只会稍微改变元素的数量，但无法获得最终的千个元素

我还尝试解析少量页面（10页）。结果列表是空的，但在控制台中所有10个

如果我删除多线程，那么一切都很好，但是，当然，速度很慢。那不好

如果减少并发线程的数量，列表中的项目数量接近所需的1000个，如果增加，则与1000个稍有距离。也就是说，我认为，要想把唱片收入榜单，还有一场斗争。但是为什么同步不起作用呢

有什么问题吗

ExecutorService executor = Executors.newFixedThreadPool(20); List<Future<?>> futures = new ArrayList<Future<?>>(1000); for (int i = 0; i < 1000; i++) { futures.add(executor.submit(new Runnable() {...})); } for (Future<?> f : futures) { f.get(); }

共 (2) 个答案

# 1 楼答案

代码没有问题，它按照您编写的代码工作。问题在于最后一次迭代。rest所有迭代都会正常工作，但在最后一次迭代（从980到1000）期间，会创建线程，但主进程不会等待其他线程完成，然后返回列表。因此，如果您一次使用20个线程，您将得到980到1000之间的奇数

现在，您可以尝试添加Thread.wait(50)，然后返回列表，在这种情况下，您的主线程将等待一段时间，到那时，其他线程可能会完成处理

或者可以使用java中的一些同步API。而不是线。wait（），使用CountDownLatch，这将帮助您等待线程完成处理，然后您可以创建新线程
# 2 楼答案
在parse()调用返回后，所有1000个线程都已启动，但不能保证它们已完成。事实上，他们不是你看到的问题。我强烈建议不要自己编写，而是使用SDK为此类工作提供的工具

文件Thread Pools和^{}是一个很好的起点。同样，如果你不确定自己是否也实现了，不要自己实现，因为编写这样的多线程代码纯粹是一种痛苦

您的代码应该如下所示：
```
ExecutorService executor = Executors.newFixedThreadPool(20);
List<Future<?>> futures = new ArrayList<Future<?>>(1000);
for (int i = 0; i < 1000; i++) { 
   futures.add(executor.submit(new Runnable() {...}));
}
for (Future<?> f : futures) {
   f.get();
}
```

Python中文网

有 Java 编程相关的问题?

多线程Java多线程解析器

共 (2) 个答案

# 1 楼答案

# 2 楼答案