有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

Java流的性能。concat VS Collection。阿道尔

用于在一个流中组合两组数据

Stream.concat(stream1, stream2).collect(Collectors.toSet());

stream1.collect(Collectors.toSet())
       .addAll(stream2.collect(Collectors.toSet()));

哪个更有效?为什么


共 (6) 个答案

  1. # 1 楼答案

    在没有基准测试的情况下,不可能预先判断,但请考虑一下:如果有许多重复项,那么Stream.concat(stream1, stream2)必须创建一个大型对象,该对象必须实例化,因为您正在调用.collect()

    然后.toSet()必须将每个事件与之前的每个事件进行比较,可能使用快速哈希函数,但仍然可能有很多元素

    另一方面,stream1.collect(Collectors.toSet()) .addAll(stream2.collect(Collectors.toSet()))将创建两个较小的集合,然后合并它们

    第二个选项的内存占用可能小于第一个选项

    编辑:

    在阅读了@NoDataFound benchmark之后,我重新审视了这个问题。在更复杂的测试版本上,确实是流。concat似乎在该系列中表现得更快。阿道尔。我试图考虑有多少不同的元素,以及初始流有多大。我还计算了从集合中创建输入流所需的时间(无论如何,这是可以忽略的)。下面是我使用下面代码获得的时间示例

    Concat-collect   10000 elements, all distinct: 7205462 nanos
    Collect-addAll   10000 elements, all distinct: 12130107 nanos
    
    Concat-collect  100000 elements, all distinct: 78184055 nanos
    Collect-addAll  100000 elements, all distinct: 115191392 nanos
    
    Concat-collect 1000000 elements, all distinct: 555265307 nanos
    Collect-addAll 1000000 elements, all distinct: 1370210449 nanos
    
    Concat-collect 5000000 elements, all distinct: 9905958478 nanos
    Collect-addAll 5000000 elements, all distinct: 27658964935 nanos
    
    Concat-collect   10000 elements, 50% distinct: 3242675 nanos
    Collect-addAll   10000 elements, 50% distinct: 5088973 nanos
    
    Concat-collect  100000 elements, 50% distinct: 389537724 nanos
    Collect-addAll  100000 elements, 50% distinct: 48777589 nanos
    
    Concat-collect 1000000 elements, 50% distinct: 427842288 nanos
    Collect-addAll 1000000 elements, 50% distinct: 1009179744 nanos
    
    Concat-collect 5000000 elements, 50% distinct: 3317183292 nanos
    Collect-addAll 5000000 elements, 50% distinct: 4306235069 nanos
    
    Concat-collect   10000 elements, 10% distinct: 2310440 nanos
    Collect-addAll   10000 elements, 10% distinct: 2915999 nanos
    
    Concat-collect  100000 elements, 10% distinct: 68601002 nanos
    Collect-addAll  100000 elements, 10% distinct: 40163898 nanos
    
    Concat-collect 1000000 elements, 10% distinct: 315481571 nanos
    Collect-addAll 1000000 elements, 10% distinct: 494875870 nanos
    
    Concat-collect 5000000 elements, 10% distinct: 1766480800 nanos
    Collect-addAll 5000000 elements, 10% distinct: 2721430964 nanos
    
    Concat-collect   10000 elements,  1% distinct: 2097922 nanos
    Collect-addAll   10000 elements,  1% distinct: 2086072 nanos
    
    Concat-collect  100000 elements,  1% distinct: 32300739 nanos
    Collect-addAll  100000 elements,  1% distinct: 32773570 nanos
    
    Concat-collect 1000000 elements,  1% distinct: 382380451 nanos
    Collect-addAll 1000000 elements,  1% distinct: 514534562 nanos
    
    Concat-collect 5000000 elements,  1% distinct: 2468393302 nanos
    Collect-addAll 5000000 elements,  1% distinct: 6619280189 nanos
    

    密码

    import java.util.HashSet;
    import java.util.Random;
    import java.util.Set;
    import java.util.stream.Collectors;
    import java.util.stream.Stream;
    
    public class StreamBenchmark {
        private Set<String> s1;
        private Set<String> s2;
    
        private long createStreamsTime;
        private long concatCollectTime;
        private long collectAddAllTime;
    
        public void setUp(final int howMany, final int distinct) {
            final Set<String> valuesForA = new HashSet<>(howMany);
            final Set<String> valuesForB = new HashSet<>(howMany);
            if (-1 == distinct) {
                for (int i = 0; i < howMany; ++i) {
                    valuesForA.add(Integer.toString(i));
                    valuesForB.add(Integer.toString(howMany + i));
                }
            } else {
                Random r = new Random();
                for (int i = 0; i < howMany; ++i) {
                    int j = r.nextInt(distinct);
                    valuesForA.add(Integer.toString(i));
                    valuesForB.add(Integer.toString(distinct + j));
                }
            }
            s1 = valuesForA;
            s2 = valuesForB;
        }
    
        public void run(final int streamLength, final int distinctElements, final int times, boolean discard) {
            long startTime;
            setUp(streamLength, distinctElements);
            createStreamsTime = 0l;
            concatCollectTime = 0l;
            collectAddAllTime = 0l;
            for (int r = 0; r < times; r++) {
                startTime = System.nanoTime();
                Stream<String> st1 = s1.stream();
                Stream<String> st2 = s2.stream();
                createStreamsTime += System.nanoTime() - startTime;
                startTime = System.nanoTime();
                Set<String> set1 = Stream.concat(st1, st2).collect(Collectors.toSet());
                concatCollectTime += System.nanoTime() - startTime;
                st1 = s1.stream();
                st2 = s2.stream();
                startTime = System.nanoTime();
                Set<String> set2 = st1.collect(Collectors.toSet());
                set2.addAll(st2.collect(Collectors.toSet()));
                collectAddAllTime += System.nanoTime() - startTime;
            }
            if (!discard) {
                // System.out.println("Create streams "+streamLength+" elements,
                // "+distinctElements+" distinct: "+createStreamsTime+" nanos");
                System.out.println("Concat-collect " + streamLength + " elements, " + (distinctElements == -1 ? "all" : String.valueOf(100 * distinctElements / streamLength) + "%") + " distinct: " + concatCollectTime + " nanos");
                System.out.println("Collect-addAll " + streamLength + " elements, " + (distinctElements == -1 ? "all" : String.valueOf(100 * distinctElements / streamLength) + "%") + " distinct: " + collectAddAllTime + " nanos");
                System.out.println("");
            }
        }
    
        public static void main(String args[]) {
            StreamBenchmark test = new StreamBenchmark();
            final int times = 5;
            test.run(100000, -1, 1, true);
            test.run(10000, -1, times, false);
            test.run(100000, -1, times, false);
            test.run(1000000, -1, times, false);
            test.run(5000000, -1, times, false);
            test.run(10000, 5000, times, false);
            test.run(100000, 50000, times, false);
            test.run(1000000, 500000, times, false);
            test.run(5000000, 2500000, times, false);
            test.run(10000, 1000, times, false);
            test.run(100000, 10000, times, false);
            test.run(1000000, 100000, times, false);
            test.run(5000000, 500000, times, false);
            test.run(10000, 100, times, false);
            test.run(100000, 1000, times, false);
            test.run(1000000, 10000, times, false);
            test.run(5000000, 50000, times, false);
        }
    }
    
  2. # 2 楼答案

    出于可读性和意图的考虑,Stream.concat(a, b).collect(toSet())比第二种选择更加清晰

    为了回答这个问题,“什么是最有效的”,这里是一个JMH测试(我想说的是,我没有太多地使用JMH,可能还有一些改进我的基准测试的空间):

    使用JMH,使用以下代码:

    package stackoverflow;
    
    import java.util.HashSet;
    import java.util.Set;
    import java.util.concurrent.TimeUnit;
    import java.util.stream.Collectors;
    import java.util.stream.Stream;
    
    import org.openjdk.jmh.annotations.Benchmark;
    import org.openjdk.jmh.annotations.BenchmarkMode;
    import org.openjdk.jmh.annotations.Fork;
    import org.openjdk.jmh.annotations.Measurement;
    import org.openjdk.jmh.annotations.Mode;
    import org.openjdk.jmh.annotations.OutputTimeUnit;
    import org.openjdk.jmh.annotations.Scope;
    import org.openjdk.jmh.annotations.Setup;
    import org.openjdk.jmh.annotations.State;
    import org.openjdk.jmh.annotations.Warmup;
    import org.openjdk.jmh.infra.Blackhole;
    
    @State(Scope.Benchmark)
    @Warmup(iterations = 2)
    @Fork(1)
    @Measurement(iterations = 10)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @BenchmarkMode({ Mode.AverageTime})
    public class StreamBenchmark {
      private Set<String> s1;
      private Set<String> s2;
    
      @Setup
      public void setUp() {
        final Set<String> valuesForA = new HashSet<>();
        final Set<String> valuesForB = new HashSet<>();
        for (int i = 0; i < 1000; ++i) {
          valuesForA.add(Integer.toString(i));
          valuesForB.add(Integer.toString(1000 + i));
        }
        s1 = valuesForA;
        s2 = valuesForB;
      }
    
      @Benchmark
      public void stream_concat_then_collect_using_toSet(final Blackhole blackhole) {
        final Set<String> set = Stream.concat(s1.stream(), s2.stream()).collect(Collectors.toSet());
        blackhole.consume(set);
      }
    
      @Benchmark
      public void s1_collect_using_toSet_then_addAll_using_toSet(final Blackhole blackhole) {
        final Set<String> set = s1.stream().collect(Collectors.toSet());
        set.addAll(s2.stream().collect(Collectors.toSet()));
        blackhole.consume(set);
      }
    }
    

    你会得到这些结果(为了可读性,我省略了一些部分)

    Result "s1_collect_using_toSet_then_addAll_using_toSet":
      156969,172 ±(99.9%) 4463,129 ns/op [Average]
      (min, avg, max) = (152842,561, 156969,172, 161444,532), stdev = 2952,084
      CI (99.9%): [152506,043, 161432,301] (assumes normal distribution)
    
    Result "stream_concat_then_collect_using_toSet":
      104254,566 ±(99.9%) 4318,123 ns/op [Average]
      (min, avg, max) = (102086,234, 104254,566, 111731,085), stdev = 2856,171
      CI (99.9%): [99936,443, 108572,689] (assumes normal distribution)
    # Run complete. Total time: 00:00:25
    
    Benchmark                                                       Mode  Cnt       Score      Error  Units
    StreamBenchmark.s1_collect_using_toSet_then_addAll_using_toSet  avgt   10  156969,172 ± 4463,129  ns/op
    StreamBenchmark.stream_concat_then_collect_using_toSet          avgt   10  104254,566 ± 4318,123  ns/op
    

    使用Stream.concat(a, b).collect(toSet())的版本应该执行得更快(如果我能很好地阅读JMH数字)

    另一方面,我认为这个结果是正常的,因为你没有创建一个中间集(即使使用HashSet,这也有一些成本),正如第一个答案的评论所说,Stream延迟连接的

    使用探查器,你可能会看到哪一部分速度较慢。您可能还希望使用toCollection(() -> new HashSet(1000))而不是toSet()来查看问题是否在于增加HashSet内部哈希数组

  3. # 3 楼答案

    两者都可以

    如果你对你的应用程序进行了配置,这段代码是一个瓶颈,那么考虑用不同的实现来分析你的应用程序,并使用一个最有效的

  4. # 4 楼答案

    你的问题被称为premature optimization。不要仅仅因为你认为一种语法更快就选择另一种语法。始终使用最能表达意图并支持理解逻辑的语法


    You know nothing about the task i am working on – alan7678

    没错

    但我不需要

    通常有两种情况:

    1. 您开发了一个OLTP应用程序。在这种情况下,应用程序应该在一秒钟或更短的时间内响应。用户不会体验到您提供的变体之间的性能差异

    2. 你开发了一种batch processing可以在无人看管的情况下运行一段时间。在这种情况下,性能差异“可能”很重要,但前提是要按批处理运行的时间收费

    不管怎样: 真正的性能问题(应用程序的速度是成倍的,而不是分数)通常是由实现的逻辑引起的(例如:过度的通信、“隐藏循环”或过度的对象创建)
    这些问题通常无法通过选择特定的语法来解决或预防

    如果为了提高性能而忽略可读性,则会使应用程序更难维护
    而且,更改难以维护的代码库很容易消耗大量资金,这是因为程序在应用程序的生命周期中使用了可读性较差但速度稍快的语法,从而提高了速度

    and without a doubt this question will matter in some cases for other people as well. – alan7678

    毫无疑问,人们很好奇

    Luckily for me syntax i prefer seems to perform better as well. – alan7678

    如果你知道,你为什么要问

    请您将测量结果和测量装置一起分享,好吗

    更重要的是:这对Java9或Java10有效吗

    Java的性能基本上来自JVM实现,这可能会发生变化。当然,对于较新的语法结构(如java流),新的java版本更有可能带来性能提升。但不能保证

    In my case the need for performance is greater than the difference in readibility. – alan7678

    5年后你还会负责这份申请吗? 或者你是一名顾问,在开始一个项目后,你会被支付报酬,然后转到下一个项目吗

    我从来没有一个项目可以在语法层面解决我的性能问题
    但我经常使用存在10多年的遗留代码,这很难维护,因为有人不尊重可读性

    So your non-answer does not apply to me. – alan7678

    这是一个自由的世界,随便你选吧

  5. # 5 楼答案

    我当时正处于决定是否使用Stream的境地。of()与flatMap()或Stream。concat()或集合。addAll()或集合。add()将多个列表合并为单个列表。我对我的代码进行了10次迭代的快速测试,得到了一些令人惊讶的结果

    ------------------------------------------------------------------
    1. Using getByAddAll()
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.414 ± 0.304  ms/op
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.291 ± 0.332  ms/op
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.571 ± 0.622  ms/op
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.520 ± 0.818  ms/op
    
    Average = 4.449ms
    ------------------------------------------------------------------
    
    ------------------------------------------------------------------
    2. Using getByAdd()
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.280 ± 0.499  ms/op
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.494 ± 0.374  ms/op
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.575 ± 0.539  ms/op
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.552 ± 0.272  ms/op
    
    Average = 4.475ms
    ------------------------------------------------------------------
    
    
    ------------------------------------------------------------------
    3. using getByStreamOf()
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.502 ± 0.529  ms/op
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.494 ± 0.754  ms/op
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.676 ± 0.347  ms/op
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.511 ± 0.950  ms/op
    
    Average = 4.545ms
    ------------------------------------------------------------------
    
    
    ------------------------------------------------------------------
    4. Using getByStreamConcat()
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.342 ± 0.372  ms/op
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.218 ± 0.400  ms/op
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.892 ± 0.562  ms/op
    
    Benchmark                         Mode  Cnt  Score   Error  Units
    PerformaceTest.test               avgt   10  4.818 ± 0.608  ms/op
    
    Average = 4.567ms
    ------------------------------------------------------------------
    

    这是我的密码

    private List<ItemDTO> getByStreamOf(OfferResponseDTO catalogOfferDTO){
        return Stream.of(
                catalogOfferDTO.getCharges()
                        .stream()
                        .map(chargeWithPricePlanResponseDTO -> new ItemDTO(chargeWithPricePlanResponseDTO.getName(), catalogOfferDTO.getDisplayOrder())),
    
                catalogOfferDTO.getUsages()
                        .stream()
                        .map(usageResponseDTO -> new ItemDTO(usageResponseDTO.getDescription(), catalogOfferDTO.getDisplayOrder())),
    
                catalogOfferDTO.getNetworkElements()
                        .stream()
                        .map(networkElementResponseDTO -> new ItemDTO(networkElementResponseDTO.getName(), catalogOfferDTO.getDisplayOrder())),
    
                catalogOfferDTO.getEquipment()
                        .stream()
                        .map(equipmentResponseDTO -> new ItemDTO(equipmentResponseDTO.getInvoiceDescription(), catalogOfferDTO.getDisplayOrder())))
    
                .flatMap(Function.identity())
                .collect(Collectors.toList());
    }
    
    
    private List<ItemDTO> getByStreamConcat(OfferResponseDTO catalogOfferDTO){
        return Stream.concat(
                Stream.concat(
                        catalogOfferDTO.getCharges()
                                .stream()
                                .map(chargeWithPricePlanResponseDTO -> new ItemDTO(chargeWithPricePlanResponseDTO.getName(), catalogOfferDTO.getDisplayOrder()))
                        ,
    
                        catalogOfferDTO.getUsages()
                                .stream()
                                .map(usageResponseDTO -> new ItemDTO(usageResponseDTO.getDescription(),catalogOfferDTO.getDisplayOrder()))
                ),
                Stream.concat(
                        catalogOfferDTO.getEquipment()
                                .stream()
                                .map(equipmentResponseDTO -> new ItemDTO(equipmentResponseDTO.getInvoiceDescription(), catalogOfferDTO.getDisplayOrder())),
    
                        catalogOfferDTO.getNetworkElements()
                                .stream()
                                .map(networkElementResponseDTO -> new ItemDTO(networkElementResponseDTO.getName(), catalogOfferDTO.getDisplayOrder()))
                )
        )
                .collect(Collectors.toList());
    }
    
    
    private List<ItemDTO> getByAddAll(OfferResponseDTO catalogOfferDTO){
        List<ItemDTO> items = new ArrayList<>();
    
        items.addAll(catalogOfferDTO.getCharges()
                .stream()
                .map(chargeWithPricePlanResponseDTO -> new ItemDTO(chargeWithPricePlanResponseDTO.getName(), catalogOfferDTO.getDisplayOrder()))
                .collect(Collectors.toList()));
    
        items.addAll(catalogOfferDTO.getUsages()
                .stream()
                .map(usageResponseDTO -> new ItemDTO(usageResponseDTO.getDescription(), catalogOfferDTO.getDisplayOrder()))
                .collect(Collectors.toList()));
    
        items.addAll(catalogOfferDTO.getNetworkElements()
                .stream()
                .map(networkElementResponseDTO -> new ItemDTO(networkElementResponseDTO.getName(), catalogOfferDTO.getDisplayOrder()))
                .collect(Collectors.toList()));
    
        items.addAll(catalogOfferDTO.getEquipment()
                .stream()
                .map(equipmentResponseDTO -> new ItemDTO(equipmentResponseDTO.getInvoiceDescription(), catalogOfferDTO.getDisplayOrder()))
                .collect(Collectors.toList()));
        return items;
    }
    
    private List<ItemDTO> getByAdd(OfferResponseDTO catalogOfferDTO){
        List<ItemDTO> items = new ArrayList<>();
    
        catalogOfferDTO.getCharges()
                .stream()
                .map(chargeWithPricePlanResponseDTO -> items.add(this.addItem(chargeWithPricePlanResponseDTO.getName(), catalogOfferDTO.getDisplayOrder())));
    
        catalogOfferDTO.getUsages()
                .stream()
                .map(usageResponseDTO -> items.add(this.addItem(usageResponseDTO.getDescription(), catalogOfferDTO.getDisplayOrder())));
    
        catalogOfferDTO.getEquipment()
                .stream()
                .map(equipmentResponseDTO -> items.add(this.addItem(equipmentResponseDTO.getInvoiceDescription(), catalogOfferDTO.getDisplayOrder())));
    
        catalogOfferDTO.getNetworkElements()
                .stream()
                .map(networkElementResponseDTO -> items.add(this.addItem(networkElementResponseDTO.getName(), catalogOfferDTO.getDisplayOrder())));
    
        return items;
    }
    
    
  6. # 6 楼答案

    首先,必须强调的是,第二种变体是不正确的。toSet()收集器返回带有“no guarantees on the type, mutability, serializability, or thread-safety”Set。如果不保证可变性,那么对结果Set调用addAll是不正确的

    它恰好与参考实现的当前版本一起工作,其中将创建一个HashSet,但可能在未来版本或替代实现中停止工作。为了解决这个问题,必须将第一个流的collect操作的toSet()替换为toCollection(HashSet::new)

    这导致了第二个变体不仅在当前实现中效率较低的情况,如this answer所示,它还可能阻止将来对toSet()收集器进行优化,因为它坚持要求结果为HashSet类型。此外,与toSet()收集器不同,toCollection(…)收集器无法检测目标集合是否无序,这在未来的实现中可能具有性能相关性