为所有可能的组合优化效率

2024-05-03 18:15:19 发布

您现在位置:Python中文网/ 问答频道 /正文

假设有一个锦标赛即将举行,你必须帮助建立。你知道吗

本次锦标赛共有N支球队,名单如下:

teams = [Team1, Team2, ..., TeamN]

在每支球队中,可以有1-3名球员,每个人都是从自己的球员名单中挑选出来的。你知道吗

假设每个列表都代表了可能的领导者、新手和助手。你知道吗

每个团队必须有一个领导者,但可能没有潜在的新人或助手。你知道吗

例如,Team1可能看起来像

Team1.Leaders = [Leader1, Leader2, ..., LeaderX1]
Team1.Rookies = [Rookie1, Rookie2, ..., RookieX2]
Team1.Helpers = [Helper1, Helper2, ..., HelperX3]

或者它可能有一个新手或助手的空列表,或者两者都有。你知道吗

每个团队、领导者、新手和助手的列表可能有不同的大小。你知道吗

您必须为每个团队选择一个领导者,并且可以选择一个菜鸟或一个助手,或者两者都选择,这取决于菜鸟和助手是否存在。(同样,每队可挑选1-3名球员)

为每一队挑选球员的所有方法列出一个组合列表。你知道吗

例如

Team1 = Team()

Team1.Leaders = [Leader1, Leader2]
Team1.Rookies = [Rookie1]
Team1.Helpers = []

Team2 = Team()
Team2.Leaders = [Leader3]
Team2.Rookies = []
Team2.Helpers = [Helper1, Helper2]

combinations = team_combinations(teams)

组合的预期输出如下所示:

[{"Team1":[Leader1, Rookie1], "Team2":[Leader3, Helper1]}, 
{"Team1":[Leader1, Rookie1], "Team2":[Leader3, Helper2]}, 
{"Team1":[Leader2, Rookie1], "Team2":[Leader3, Helper1]}, 
{"Team1":[Leader2, Rookie1], "Team2":[Leader3, Helper2]}]

每个字典都是你如何选择玩家的组合。你知道吗

我正在努力使获得组合的过程尽可能快,你可以想象,当有许多球员,你可以选择,你甚至可能有数百万不同的组合。你知道吗

我尝试过使用递归,其中基本情况是len(teams)==1,递归步骤是其他步骤。我成功地创建了一个可以工作的函数,但是获取组合的过程甚至需要一个小时才能完成。你知道吗

什么是获得组合的最快方法? 使用itertools会有帮助吗? 有没有更快的方法?你知道吗


Tags: 列表助手团队球员新手teams领导者team1
1条回答
网友
1楼 · 发布于 2024-05-03 18:15:19

首先,你只能做一件事来降低问题的复杂性——减少团队和球员的数量。这是可靠地加速代码的一种可靠方法。你知道吗

另一方面,如果组合的总数小于10**9,任何微观优化都变得非常重要。使用低级编程语言也是一种选择。至于Python,我要强调以下几点:

  • 使用内置的语言特性,例如在itertools模块中
  • 使用内置的tuple作为leadersrookieshelpers的类型。它有相当快的迭代器并允许避免数据的隐式复制(例如在itertools.product
  • 缓存某些组合或团队组合。这种优化的有效性取决于可用内存的大小和小团队的数量
  • 堆栈而不是递归。尽管具有可读性和易用性,但在Python中调用函数是一种昂贵的乐趣。通过使用尾部递归,可能会消耗所有可用内存,并且超出sys.getrecursionlimit()。你知道吗
  • pythonfor-循环代替堆栈
  • python函数特性取代了for-循环。你知道吗

这是一个小草图。我写的所有类都没有__init__,应该用具体的实现来继承。你知道吗

from itertools import product, chain
from operator import attrgetter
try:
    from math import prod # python 3.8
except ImportError:
    from functools import reduce
    from operator import mul
    prod = lambda i: reduce(mul, i, 1)


class Team(object):
    __slots__ = ('leaders', 'rookies', 'helpers')
    leaders: tuple
    rookies: tuple
    helpers: tuple

    def l_product(self):
        return product(self.leaders) # optimization tricky

    def l_r_product(self):
        return product(self.leaders, self.rookies)

    def l_h_product(self):
        return product(self.leaders, self.helpers)

    def l_r_h_product(self):
        return product(self.leaders, self.rookies, self.helpers)

    def __iter__(self):
        return chain(self.l_product(), self.l_r_product(), self.l_h_product(), self.l_r_h_product())

    def __len__(self):
        return (
            (l := len(self.leaders)) +
            (l * (r := len(self.rookies))) +
            (l * (h := len(self.helpers))) +
            (l * r * h)
        )

    @property
    def combs_size(self):
        # empiric value indicating quite close size of all produced tuples-combinations in memory
        return (
            (l := len(self.leaders)) +
            ((l * (r := len(self.rookies))) * 2) +
            ((l * (h := len(self.helpers))) * 2) +
            (l * r * h * 3)
        )


class CachedTeam(Team):
    __slots__ = Team.__slots__ + ('_cached_combs',)

    def __init__(self, team):
        self.leaders = team.leaders
        self.rookies = team.rookies
        self.helpers = team.helpers
        self._cached_combs = tuple(team)

    def __iter__(self):
        return iter(self._cached_combs)


class Tournament(object):
    _teams: list
    _producer: ... # Callable
    _free_memory = float('inf') # empiric value indicating ALL memory that can be occupied 
                                # while producing elements
    _max_effective_teams_length = 100 # empiric value, it is related with max-memory
                                      # that _compiled_producer can occupy
    _min_required_memory_multiplier = 4 # empiric value, it is related with memory required to
                                        # _stack_producer
    CachedTeam = CachedTeam
    _wrapper = tuple

    def __len__(self):
        return prod(map(len, self._teams))

    def __iter__(self):
        try:
            return self._producer()
        except AttributeError:
            self._create_producer()
            return self._producer()

    @property
    def teams_combs_size(self):
        return sum(map(attrgetter('combs_size'), self._teams))

    def _create_producer(self):
        teams = self._teams
        teams_length = len(teams)
        if not teams_length:
            self._producer = ((),).__iter__
        elif self.teams_combs_size < self._free_memory:
            # itertools.product creates tuple-copy for all non-tuple input
            self._producer = self._product_producer
        else:
            self._check_memory()
            self._configure_cache()
            if teams_length <= max(2, self._max_effective_teams_length):
                self._producer = self._compiled_producer
            else:
                self._producer = self._stack_producer

    def _product_producer(self):
        # it should be used when we have a lot of memory
        return product(*self._teams)

    @property
    def _compiled_producer(self):
        # e.g: ((t0,t1,t2,) for t0 in teams[0] for t1 in teams[1] for t2 in teams[2])
        # it should be used when number of teams is not big
        teams_length = len(self._teams)
        assert teams_length >= 1
        producer_code = 'lambda: (({yielded}){for_loops})'
        return eval(
            producer_code.format(
                yielded = ''.join(map('t{},'.format, range(0, teams_length))),
                for_loops = ''.join(map(' for t{0} in teams[{0}]'.format,
                                        range(0, teams_length))),
            ),
            {'teams': self._teams}
        )

    def _stack_producer(self):
        # it should be used when number of teams is big
        wrapper = self._wrapper
        teams = self._teams
        teams_length = len(teams)
        assert teams_length >= 2
        common = [None] * teams_length # common memory
        teams_iterators = [None] * teams_length
        teams_iterators[0] = iter(teams[0])
        last_team = teams[-1]
        x = 0  # stack pointer
        max_x = teams_length - 1
        pre_max_x = max_x - 1

        while True:
            try:
                common[x] = next(teams_iterators[x])
            # except StopIteration:
            except:
                if x: # if first place loop is not over:
                    x -= 1 # decrement stack pointer
                    continue
                return
            if x == pre_max_x:
                for common[max_x] in last_team:
                    yield wrapper(common) # don't forget to use wrapper that creates copy!!
            else:
                # create next place loop and increment stack pointer
                teams_iterators[x] = iter(teams[(x := x + 1)])

    def _check_memory(self):
        self._free_memory -= (len(self._teams) * self._min_required_memory_multiplier)
        if self._free_memory < 0:
            raise ValueError('not enough memory to operate with teams')

    def _configure_cache(self):
        teams = self._teams
        CachedTeam = self.CachedTeam
        free_memory = self._free_memory

        # We want to cache small teams at first. It is also quite easy to get their iterators,
        # so we want to iterate them more often than not-cached teams:
        teams.sort(key=len, reverse=True)
        try:
            for i, t in enumerate(reversed(self._teams), 1):
                if (cs := t.combs_size) > free_memory:
                    return
                teams[-i] = CachedTeam(t)
                free_memory -= cs
        finally:
            self._free_memory = free_memory

我已经编写了三个获得所有团队笛卡尔积的基本实现:_product_producer_compiled_producer_stack_producer。要选择其中一个,我们需要知道我们拥有的内存的近似值。这就是为什么我们需要声明combs_size_free_memory_max_effective_teams_length_min_required_memory_multiplierteams_combs_size。你知道吗

内部的_teams应该是私有的,因为生成的组合的顺序取决于团队的顺序。为了优化生产商,我们可能需要度假团队。你知道吗

_configure_cache可以在__init__内部的某个地方调用,但对我来说似乎是错误的,因为初始化不应该太长。如果你想的话也可以扔掉。你知道吗

以下是一些测试:

team0 = Team()
team0.leaders = ('L00',)
team0.rookies = ('R00', 'R01', 'R02')
team0.helpers = ('H00', 'H01')

team1 = Team()
team1.leaders = ('L10', 'L11')
team1.rookies = ()
team1.helpers = ()

team2 = Team()
team2.leaders = ('L20',)
team2.rookies = ()
team2.helpers = ('H20', 'H21')

team3 = Team()
team3.leaders = ('L30',)
team3.rookies = ('R30', 'R31', 'R32', 'R33', 'R34')
team3.helpers = ()

tm0 = Tournament()
tm0._teams = [team0, team1, team2, team3]

tm1 = Tournament()
tm1._teams = list(tm0._teams)
tm1._free_memory = 30

tm2 = Tournament()
tm2._teams = list(tm0._teams)
tm2._free_memory = 30
tm2._max_effective_teams_length = 3

是的。你知道吗

>>> list(team2)
[('L20',), ('L20', 'H20'), ('L20', 'H21')]

>>> for t in tm0._teams:
...     print(len(list(t)), len(t), type(t))
12 12 <class '__main__.Team'>
2 2 <class '__main__.Team'>
3 3 <class '__main__.Team'>
6 6 <class '__main__.Team'>

>>> len(tm0) == len(list(tm0)) == len(list(tm1)) == len(list(tm2)) == 12 * 2 * 3 * 6 == 432
True

>>> for t in tm1._teams:
...     print(len(t), type(t))
12 <class '__main__.Team'>
6 <class '__main__.Team'>
3 <class '__main__.CachedTeam'>
2 <class '__main__.CachedTeam'>

>>> list(tm0._product_producer()) == list(tm0._compiled_producer()) == list(tm0._stack_producer())
True

相关问题 更多 >