如何在python中找到不同列元素的最小和集？

2条回答

网友

1楼 · 编辑于 2024-10-02 00:30:39

这个问题描述了“稳定婚姻问题”的一个变体。它有一个非常简单的算法；我在下面粘贴了最简单的版本-还有其他更严格的版本。未订婚的求婚者向对方求婚，对方拒绝或接受（解除其当前订婚）。这将一直持续到各方都参与进来

# Gale-Shapley algorithm:
algorithm stable_matching is
    Initialize all m ∈ M and w ∈ W to free
    while ∃ free man m who still has a woman w to propose to do
        w := first woman on m`s list to whom m has not yet proposed
        if w is free then
            (m, w) become engaged
        else some pair (m`, w) already exists
            if w prefers m to m` then
                m` becomes free
                (m, w) become engaged 
            else
                (m`, w) remain engaged
            end if
        end if
    repeat

对稳定婚姻问题后的项目分配解决方案进行建模，我们可以将Teammate一方视为求婚者，另一方视为一夫多妻Project，可以与多个求婚者订婚。这看起来像是很多代码，但它相对较短且易于理解

如果严格遵循上述算法，项目分布可能包含一些队友的第3或第4个选择-这不是一个理想的情况。因此，我发现随机化作业顺序，并在跟踪最佳分布的同时重试几次，会产生最佳结果

每次运行时，可能会产生一个得分相等的项目分布，但队友的安排不同。因此，如果队友出于某种原因决定不喜欢1号或2号选择，只需生成一个新的分布

可运行代码位于：https://pastebin.com/kVj0FuJP

import io
import pandas as pd
from copy import deepcopy
from itertools import cycle
from random import randint, shuffle
from statistics import mean, stdev

class Teammate:
    def __init__(self, identifier, projects, rankings):
        self._id        = identifier
        self._projects  = list(projects)
        self._rankings  = dict(zip(projects, rankings))
        self._prefs     = self.gen_project_preferences_cycle()
        self._project   = None

    @property
    def id(self):
        return self._id

    def rank(self, project):
        return self._rankings[project]

    @property
    def state(self):
        return 'ENGAGED' if self._project else 'FREE'

    def propose(self, project):
        return project.consider(self)

    def engage(self, project):
        self._project = project
        project.add(self)

    def disengage(self, project):
        self._project = None
        project.remove(self)

    @property
    def project_preferences(self):
        return self._prefs

    def gen_project_preferences_cycle(self):
        """ Returns a generator for a cyclical list of preferences in 
        order.
        """
        prefs = sorted(self._projects, key=self.rank)
        if randint(0, 1):
            prefs.insert(0, prefs[1])
        return cycle(prefs)

    def reset(self):
        self._project = None
        self._prefs   = self.gen_project_preferences_cycle()

通过将第二个选择随机分配到一些Teammate的首选项列表的前面，我们在放置上获得了一个有趣的效果。通过让一些队友开始他们的第二选择，他们很可能会被淘汰出他们的第二选择，然后找到另一个他们评为1的项目。最终的效果是你得到的矩阵大多是1，2的一小部分，没有3和4

Project类中更有趣的方法是consider()方法，其中Project考虑了Teammate的参与提议。这个变化无常的项目将抛弃现有求婚者中任何一个订婚的求婚者，如果另一个求婚者提供更好的报价的话

class Project:
    def __init__(self, identifier, seats):
        self._id    = identifier
        self._seats = seats
        self._team  = []

    @property
    def id(self):
        return self._id

    @property
    def score(self):
        return sum(tm.rank(self) for tm in self._team)

    def consider(self, teammate):
        """ Consider a proposal, have the Teammate establish the engagement
        if accepted, then return True; otherwise return False.
        """
        rank    = teammate.rank(self)
        team    = self._team
        seats   = self._seats
        if len(team) < seats or any(rank < tm.rank(self) for tm in team):
            # Either there's a seat available, or the newcomer has a better
            # offer than one of the current suitors.
            if len(team) >= seats:
                shuffle(team)
                tm_out = max(team, key=lambda tm: tm.rank(self))
                tm_out.disengage(self)  # Hit the road Jack...
            teammate.engage(self)  # What? and no ring!?...
            return True
        return False

    def add(self, teammate):
        self._team.append(teammate)

    def remove(self, teammate):
        self._team.remove(teammate)

    def clear(self):
        self._team = []

    @property
    def state(self):
        return 'FREE' if len(self._team) < self._seats else 'ENGAGED'

    @property
    def roster(self):
        self._team.sort(key=lambda tm: tm.id)
        return f"{self.id}: {[(tm.id, tm.rank(self)) for tm in self._team]}"

当然，在现代世界中，每个人都在使用婚介服务，Teammate和Projects。。。不过，这项服务有点古怪，在某种程度上依赖于随机尝试，直到找到适合各方的最佳匹配

class ProjectMatcher:
    def __init__(self, projects, teammates):
        self._projects  = projects
        self._teammates = teammates

    def match(self, num_iterations=1000):
        best_score   = 99999
        best_distrib = None

        for _ in range(num_iterations):
            # This is the piece that does the actual matching. You can 
            # see each line corresponds to a step in the Gale-Shapely
            # algorithm. You can find the details of some steps within
            # the Teammate and Project methods.               

            for teammate in self.gen_free_teammates_cycle():
                for project in teammate.project_preferences:
                    accepted = teammate.propose(project)
                    if accepted:
                        break

            # Determine if this distribution is the best so far.
            if self.evaluate() < best_score:
                best_score   = self.evaluate()
                best_distrib = deepcopy(self._projects)

            # Get ready for another round of matchmaking.
            self.reset()

        # Print out the best distribution of assignments found.
        print("PROJECT ASSIGNMENTS:")
        for project in best_distrib:
            print(f"    {project.roster}")

    def evaluate(self):
        # Determine the quality of the project assignments. We want a
        # low average score, preferably without outliers.
        s = [p.score for p in self._projects]
        m = mean(s)
        d = stdev(s, m)
        return m + d

    def gen_free_teammates_cycle(self):
        """ Returns a cyclical generator of the list of Teammates that are
        still unengaged.
        """
        teammates = list(self._teammates)
        shuffle(teammates)
        while any(tm.state == 'FREE' for tm in teammates):
            for tm in teammates:
                if tm.state == 'FREE':
                    yield tm
            shuffle(teammates)

    def reset(self):
        [p.clear() for p in self._projects]
        [t.reset() for t in self._teammates]

    @staticmethod
    def from_dataframe(df):
        n_seats   = len(df.values) // (len(df.columns) - 1)
        projects  = [Project(proj, n_seats) for proj in df.columns[1:]]
        teammates = [Teammate(tm[0], projects, tm[1:]) for tm in df.values]
        return ProjectMatcher(projects, teammates)

把这一切放在一起

if __name__ == '__main__':

    df = pd.read_csv(io.StringIO("""
    employee  proj_A  proj_B  proj_C  proj_D  proj_E
          A1       1       5       3       4       2
          B1       5       4       1       2       3
          C1       2       3       4       1       5
          A2       4       2       1       3       5
          B2       4       5       3       2       1
          C2       3       1       2       5       4
          A3       1       2       4       3       5
          B3       2       3       1       5       4
          C3       5       3       4       1       2
          A4       4       5       3       2       1
          B4       5       3       4       2       1
          C4       1       2       3       4       5
          A5       1       3       2       5       4
          B5       2       1       3       5       4
          C5       2       1       4       5       4
      """), sep=r'\s+')

    pm = ProjectMatcher.from_dataframe(df)

    pm.match()

输出：

$ python projectmatcher.py 
PROJECT ASSIGNMENTS:
    proj_A: [('A1', 1), ('A3', 1), ('C4', 1)]
    proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
    proj_C: [('A2', 1), ('A5', 2), ('B3', 1)]
    proj_D: [('B1', 2), ('C1', 1), ('C3', 1)]
    proj_E: [('A4', 1), ('B2', 1), ('B4', 1)]

结果始终是好的，可能有100次迭代，它默认为1K表示好的度量，这只需要眨眼的时间。不知何故，尽管部分基于偶然，它还是成功地制作出了高质量的配对

可以修改该程序以收集得分最低的所有唯一分发，并生成一个列表。每一项都有相同的最低分数，但分布不同：

PROJECT ASSIGNMENTS:
    option 1
        proj_A: [('A3', 1), ('A5', 1), ('C4', 1)]
        proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
        proj_C: [('A2', 1), ('B1', 1), ('B3', 1)]
        proj_D: [('B4', 2), ('C1', 1), ('C3', 1)]
        proj_E: [('A1', 2), ('A4', 1), ('B2', 1)]
    option 2
        proj_A: [('A3', 1), ('A5', 1), ('C4', 1)]
        proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
        proj_C: [('A2', 1), ('B1', 1), ('B3', 1)]
        proj_D: [('A4', 2), ('C1', 1), ('C3', 1)]
        proj_E: [('A1', 2), ('B2', 1), ('B4', 1)]
    option 3
        proj_A: [('A1', 1), ('A3', 1), ('C4', 1)]
        proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
        proj_C: [('A2', 1), ('A5', 2), ('B3', 1)]
        proj_D: [('B1', 2), ('C1', 1), ('C3', 1)]
        proj_E: [('A4', 1), ('B2', 1), ('B4', 1)]
    option 4
        proj_A: [('A3', 1), ('A5', 1), ('C4', 1)]
        proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
        proj_C: [('A2', 1), ('B1', 1), ('B3', 1)]
        proj_D: [('B2', 2), ('C1', 1), ('C3', 1)]
        proj_E: [('A1', 2), ('A4', 1), ('B4', 1)]

网友

2楼 · 编辑于 2024-10-02 00:30:39

我想用遗传算法进行实验，这似乎是一个很好的优化类型的问题。有15行可以是任意顺序，一共有15行！排列，或1.0e+12。尝试所有排列的蛮力方法是不实际的

我有下面的函数来计算人口中个体的“适合度”。分数是平均值和标准差的组合。我的数学可能不完全正确，我肯定是用numpy来做的，但它似乎产生了很好的结果

def calculate_fitness(population):
    fitness_scores = []

    for individual in population:
        # Group the rows in 3's according to the columns.
        proj_a = individual[  : 3,1]  # First 3 rows, column 1.
        proj_b = individual[ 3: 6,2]  # Next  3 rows, column 2, etc.
        proj_c = individual[ 6: 9,3]
        proj_d = individual[ 9:12,4]
        proj_e = individual[12:15,5]  # Bottom 3 rows, last column.

        arr = np.array([proj_a, proj_b, proj_c, proj_d, proj_e])

        mean = arr.mean()          # Mean.
        std  = np.abs(arr.std())   # Standard deviation.

        # We want both the lowest mean and lowest standard deviation.
        # For simplicity, let's just add them and use that as the score.
        fitness_scores.append(mean + std)

    # Invert and scale the values so they can be used as weights
    # for random selection.
    fitness_scores  = np.array(fitness_scores)
    fitness_scores  = (fitness_scores.max() + .3 ) - fitness_scores
    fitness_scores /= (fitness_scores.max() + .07)
    fitness_scores *= 100

    return fitness_scores

输出-前3行属于A，后3行属于B，依此类推：

employee proj_A proj_B proj_C proj_D proj_E
      A3      1      2      4      3      5
      C4      1      2      3      4      5
      A1      1      5      3      4      2
      C2      3      1      2      5      4
      B5      2      1      3      5      4
      C5      2      1      4      5      4
      A2      4      2      1      3      5
      A5      1      3      2      5      4
      B3      2      3      1      5      4
      B1      5      4      1      2      3
      C3      5      3      4      1      2
      C1      2      3      4      1      5
      B2      4      5      3      2      1
      B4      5      3      4      2      1
      A4      4      5      3      2      1

在这一组中，似乎每个人都很高兴，这可能是最佳组合

在这里，每个人都非常满意所有的1分，除了A3得到3分

employee proj_A proj_B proj_C proj_D proj_E
      C4      1      _      _      _      _
      A1      1      _      _      _      _
      A5      1      _      _      _      _
      B5      _      1      _      _      _
      C2      _      1      _      _      _
      C5      _      1      _      _      _
      A2      _      _      1      _      _
      B3      _      _      1      _      _
      B1      _      _      1      _      _
      C1      _      _      _      1      _
      A3      _      _      _      3      _
      C3      _      _      _      1      _
      A4      _      _      _      _      1
      B4      _      _      _      _      1
      B2      _      _      _      _      1

我发现，调整高突变率，保护前5名个体不受突变和死亡的影响，大大提高了结果

父母是通过随机抽取4个人，将他们的健康分数作为权重来选择，以选择更健康的父母。然后，将4个中的前4个中的任何一个与不具有相同适应度分数的其他任何一个进行匹配，以尝试防止近亲繁殖，并将种群多样性保持在一个良好的范围内

每次迭代，一个个体死亡，两个父代被选择并产生一个子代，并且以50%的速率通过随机交换其两行来选择和变异一个个体

我发现最好的群体是150个成员，1k到2k的迭代似乎得到了一致的结果

相关问题更多 >

编程相关推荐

热门问题

热门文章