性能Java流：收集长数据流的复杂性是否与基于Set:：contains对其进行过滤的复杂性相同？

1 周，2 日 Questions & Answers 85

我有一个应用程序，它接受员工ID作为用户输入，然后过滤员工列表中匹配的ID。用户输入应该是3-4个ID，员工列表是几千个

基于性能考虑，我提出了以下两种使用流过滤器的方法

方法1

Motivation here is to not run filter for each employee, rather run it on the requested ids list which is guaranteed to be very short.

private static Set<Long> identifyEmployees(CustomRequest request)
  List<Long> requestedIds = request.getRequestedIDs();                            
  if (!requestedIds.isEmpty()) {
      Set<Long> allEmployeeIds = 
              employeeInfoProvider
                .getEmployeeInfoList()  // returns List<EmployeeInfo>
                .stream()
                .map(EmployeeInfo::getEmpId)  // getEmpId() returns a Long
                .collect(Collectors.toSet());

      return requestedIds.stream().filter(allEmployeeIds::contains).collect(Collectors.toSet());         
  }
  return Collections.emptySet();
}

方法2

Motivation here is to replace collect() in Method1 with a filter as complexity would be same. collect() here would actually be running on a very small number of elements.

private static Set<Long> identifyEmployees(CustomRequest request)
  Set<Long> requestedIds = request.getRequestedIDs()   // returns List<Long>
                          .stream()
                          .collect(Collectors.toSet());
  if (!requestedIds.isEmpty()) {
      return employeeInfoProvider
               .getEmployeeInfoList()  // returns List<EmployeeInfo>
               .stream()
               .map(EmployeeInfo::getEmpId)  // getEmpId() returns a Long
               .filter(requestedIds::contains)
               .collect(Collectors.toSet());
  }
  return Collections.emptySet();
}

方法2的表现和方法1一样好吗？还是Method1的性能更好

private static Set<Long> identifyEmployees(CustomRequest request) { Set<Long> requestedIds = request.getRequestedIDs() // returns List<Long> .stream() .collect(Collectors.toSet()); Set<Long> result = new HashSet<>(); if (!requestedIds.isEmpty()) { Iterator<EmployeeInfo> employees = employeeInfoProvider.getEmployeeInfoList().iterator(); while (result.size() < requestedIds.size() && employees.hasNext()) { Long employeeId = employees.next().getEmpId(); if (requestedIds.contains(employeeId)) { result.add(employeeId); } } } return result; }

共 (2) 个答案

# 1 楼答案

一个可能更快（而不是更干净）的选择是在检测到所有requestEd后立即返回，但我不确定是否可以用Stream API实现

然而，只有当employeeInfoProvider.getEmployeeInfoList()返回具有相同ID的员工的多个副本时，它才有意义。否则，如上所述，方法2是一个更好的选择

# 2 楼答案

我希望Method2在所有情况下都能表现得一样好或更好

收集到中间集合会增加分配开销。如果有很多重复的requestedIds::contains调用，它会减少你以后必须执行的^{调用的数量，但即使如此，你也会将每个Set::add调用换成一个Set::contains调用，每个调用都应该是一个小胜利

Python中文网

有 Java 编程相关的问题?

性能Java流：收集长数据流的复杂性是否与基于Set:：contains对其进行过滤的复杂性相同？

共 (2) 个答案

# 1 楼答案

# 2 楼答案