如何在python中找到不同列元素的最小和集?

2024-10-02 00:30:39 发布

您现在位置:Python中文网/ 问答频道 /正文

问题陈述:共有5个项目和15名员工,每列的数字表示每个员工对给定项目的兴趣。每个项目最多可有3名员工。分数为1-5分,1分为最高偏好,5分为最低偏好。我必须在项目中分配员工,以使不满意的人数最少或得分最低。请注意,我的算法创建所有可能的组合,然后按升序对这些组合进行排序,并选择具有不同雇员的前5个组合

但问题是,例如,我的sortedsum矩阵是[1,1,1,4,9,9,…]现在这个算法是正确的,但问题是如果我选择前5个,我的总和将是16。但是有一种可能,如果我把[2,1,1,4,4]作为前四个,那么第五个项目团队的总和会变成3,这样最小值就会改变,这就是我的算法失败的地方

我有一个3nXn矩阵,在这个例子中,我将把它作为一个15x5: 所以矩阵看起来像这样(https://i.stack.imgur.com/omcq7.png):

df = pd.read_csv(io.StringIO("""

employee  proj_A  proj_B  proj_C  proj_D  proj_E
      A1       1       5       3       4       2
      B1       5       4       1       2       3
      C1       2       3       4       1       5
      A2       4       2       1       3       5
      B2       4       5       3       2       1
      C2       3       1       2       5       4
      A3       1       2       4       3       5
      B3       2       3       1       5       4
      C3       5       3       4       1       2
      A4       4       5       3       2       1
      B4       5       3       4       2       1
      C4       1       2       3       4       5
      A5       1       3       2       5       4
      B5       2       1       3       5       4
      C5       2       1       4       5       4

      """), sep=r"\s+")

[格式化以便于粘贴到shell中]

我想解决的问题是在每一列中选择三个不同的元素,不同的行有不同的含义,以这样的方式,它们对所有5列的总和仍然是最小的

例如,这里,如果我选择A1,B1,C1作为A,然后选择A2,B2,C2作为B等等,那么A的和是1+5+2=8,B的和是2+5+1=8,当加在一起时,也就是8+8+。。。应为所有可能组合的最小和。请注意,如果A1、B1和C1被分配到A,则它们不能切换到B或任何其他下一列

我尝试的是创建所有可能的组合,从A1、B1、C1到A5、B5和C5,计算它们的总和并按递增顺序排序,然后选择具有不同元素的前五个,如下所示:

我的代码的限制: 1.我正在优化的矩阵(一个30x10的矩阵)花费了太多时间,因为组合太多了。 2.它将忽略任何情况,即通过将初始元素的分数降低到稍高一点,我们可以得到可以大幅降低的中间分数

import pandas as pd
data=pd.read_csv("csvfile.csv")
teamsize=3
employes=data["Name"]
PScore=[]
for i in range(10):
    PScore.append(data[f"Project {i+1}"])
Scorings_combo=[]
for i in range(len(employes)):
    for j in range(len(employes)):
        for k in range(len(employes)):
            for l in range(10):
                if i==j or j==k or k==i:
                    break
                score=0
                score=score+PScore[l][i]+PScore[l][j]+PScore[l][k]
                Scorings_combo.append([i+1,j+1,k+1,l+1,score])
a=[Scorings_combo[i][4]for i in range(len(Scorings_combo))]
#b=sorted(a,reverse=True)
b=sorted(a)
emps=[]
sig=1
empl=[]
passigned=[]
countee=0
for i in range(len(b)):
    for j in range(3):
        if Scorings_combo[a.index(b[i])][j] in emps or Scorings_combo[a.index(b[i])][3] in passigned:
            a[a.index(b[i])]=-1
            sig=0
            break
    if sig!=0:
        print("New")
        for k in range(3):emps.append(Scorings_combo[a.index(b[i])][k])
        empl.append(Scorings_combo[a.index(b[i])])
        passigned.append(Scorings_combo[a.index(b[i])][3])
        countee=countee+1
        if count==8:
            break
    sig=1
    print(f"Iteration:{i}/{len(b)}")

例如: 3,3,3,4,9 即使有以下可能,也将是解决方案: 4,4,4,4,4 因为它将寻找降序的不同元素,这给了我第一个解决方案

如果你有任何想法,请帮助我。谢谢

这是数据的驱动器链接:https://drive.google.com/file/d/1yaswBEi3RzrhQ743hJTnUeZFZNo-QBBR/view?usp=sharing

下面是一个简单的例子: 矩阵=[[1,2],[2,1],[1,2],[1,2],[2,1],[1,2]] 现在,最小可能的组合是:对于第一列:[1,1,1]和第二列[1,1,2]


Tags: 项目inforindexlena1员工range
2条回答

这个问题描述了“稳定婚姻问题”的一个变体。它有一个非常简单的算法;我在下面粘贴了最简单的版本-还有其他更严格的版本。未订婚的求婚者向对方求婚,对方拒绝或接受(解除其当前订婚)。这将一直持续到各方都参与进来

# Gale-Shapley algorithm:
algorithm stable_matching is
    Initialize all m ∈ M and w ∈ W to free
    while ∃ free man m who still has a woman w to propose to do
        w := first woman on m`s list to whom m has not yet proposed
        if w is free then
            (m, w) become engaged
        else some pair (m`, w) already exists
            if w prefers m to m` then
                m` becomes free
                (m, w) become engaged 
            else
                (m`, w) remain engaged
            end if
        end if
    repeat

对稳定婚姻问题后的项目分配解决方案进行建模,我们可以将Teammate一方视为求婚者,另一方视为一夫多妻Project,可以与多个求婚者订婚。这看起来像是很多代码,但它相对较短且易于理解

如果严格遵循上述算法,项目分布可能包含一些队友的第3或第4个选择-这不是一个理想的情况。因此,我发现随机化作业顺序,并在跟踪最佳分布的同时重试几次,会产生最佳结果

每次运行时,可能会产生一个得分相等的项目分布,但队友的安排不同。因此,如果队友出于某种原因决定不喜欢1号或2号选择,只需生成一个新的分布

可运行代码位于:https://pastebin.com/kVj0FuJP

import io
import pandas as pd
from copy import deepcopy
from itertools import cycle
from random import randint, shuffle
from statistics import mean, stdev

class Teammate:
    def __init__(self, identifier, projects, rankings):
        self._id        = identifier
        self._projects  = list(projects)
        self._rankings  = dict(zip(projects, rankings))
        self._prefs     = self.gen_project_preferences_cycle()
        self._project   = None

    @property
    def id(self):
        return self._id

    def rank(self, project):
        return self._rankings[project]

    @property
    def state(self):
        return 'ENGAGED' if self._project else 'FREE'

    def propose(self, project):
        return project.consider(self)

    def engage(self, project):
        self._project = project
        project.add(self)

    def disengage(self, project):
        self._project = None
        project.remove(self)

    @property
    def project_preferences(self):
        return self._prefs

    def gen_project_preferences_cycle(self):
        """ Returns a generator for a cyclical list of preferences in 
        order.
        """
        prefs = sorted(self._projects, key=self.rank)
        if randint(0, 1):
            prefs.insert(0, prefs[1])
        return cycle(prefs)

    def reset(self):
        self._project = None
        self._prefs   = self.gen_project_preferences_cycle()

通过将第二个选择随机分配到一些Teammate的首选项列表的前面,我们在放置上获得了一个有趣的效果。通过让一些队友开始他们的第二选择,他们很可能会被淘汰出他们的第二选择,然后找到另一个他们评为1的项目。最终的效果是你得到的矩阵大多是1,2的一小部分,没有3和4

Project类中更有趣的方法是consider()方法,其中Project考虑了Teammate的参与提议。这个变化无常的项目将抛弃现有求婚者中任何一个订婚的求婚者,如果另一个求婚者提供更好的报价的话

class Project:
    def __init__(self, identifier, seats):
        self._id    = identifier
        self._seats = seats
        self._team  = []

    @property
    def id(self):
        return self._id

    @property
    def score(self):
        return sum(tm.rank(self) for tm in self._team)

    def consider(self, teammate):
        """ Consider a proposal, have the Teammate establish the engagement
        if accepted, then return True; otherwise return False.
        """
        rank    = teammate.rank(self)
        team    = self._team
        seats   = self._seats
        if len(team) < seats or any(rank < tm.rank(self) for tm in team):
            # Either there's a seat available, or the newcomer has a better
            # offer than one of the current suitors.
            if len(team) >= seats:
                shuffle(team)
                tm_out = max(team, key=lambda tm: tm.rank(self))
                tm_out.disengage(self)  # Hit the road Jack...
            teammate.engage(self)  # What? and no ring!?...
            return True
        return False

    def add(self, teammate):
        self._team.append(teammate)

    def remove(self, teammate):
        self._team.remove(teammate)

    def clear(self):
        self._team = []

    @property
    def state(self):
        return 'FREE' if len(self._team) < self._seats else 'ENGAGED'

    @property
    def roster(self):
        self._team.sort(key=lambda tm: tm.id)
        return f"{self.id}: {[(tm.id, tm.rank(self)) for tm in self._team]}"

当然,在现代世界中,每个人都在使用婚介服务,TeammateProjects。。。不过,这项服务有点古怪,在某种程度上依赖于随机尝试,直到找到适合各方的最佳匹配

class ProjectMatcher:
    def __init__(self, projects, teammates):
        self._projects  = projects
        self._teammates = teammates

    def match(self, num_iterations=1000):
        best_score   = 99999
        best_distrib = None

        for _ in range(num_iterations):
            # This is the piece that does the actual matching. You can 
            # see each line corresponds to a step in the Gale-Shapely
            # algorithm. You can find the details of some steps within
            # the Teammate and Project methods.               

            for teammate in self.gen_free_teammates_cycle():
                for project in teammate.project_preferences:
                    accepted = teammate.propose(project)
                    if accepted:
                        break

            # Determine if this distribution is the best so far.
            if self.evaluate() < best_score:
                best_score   = self.evaluate()
                best_distrib = deepcopy(self._projects)

            # Get ready for another round of matchmaking.
            self.reset()

        # Print out the best distribution of assignments found.
        print("PROJECT ASSIGNMENTS:")
        for project in best_distrib:
            print(f"    {project.roster}")

    def evaluate(self):
        # Determine the quality of the project assignments. We want a
        # low average score, preferably without outliers.
        s = [p.score for p in self._projects]
        m = mean(s)
        d = stdev(s, m)
        return m + d

    def gen_free_teammates_cycle(self):
        """ Returns a cyclical generator of the list of Teammates that are
        still unengaged.
        """
        teammates = list(self._teammates)
        shuffle(teammates)
        while any(tm.state == 'FREE' for tm in teammates):
            for tm in teammates:
                if tm.state == 'FREE':
                    yield tm
            shuffle(teammates)

    def reset(self):
        [p.clear() for p in self._projects]
        [t.reset() for t in self._teammates]

    @staticmethod
    def from_dataframe(df):
        n_seats   = len(df.values) // (len(df.columns) - 1)
        projects  = [Project(proj, n_seats) for proj in df.columns[1:]]
        teammates = [Teammate(tm[0], projects, tm[1:]) for tm in df.values]
        return ProjectMatcher(projects, teammates)

把这一切放在一起

if __name__ == '__main__':

    df = pd.read_csv(io.StringIO("""
    employee  proj_A  proj_B  proj_C  proj_D  proj_E
          A1       1       5       3       4       2
          B1       5       4       1       2       3
          C1       2       3       4       1       5
          A2       4       2       1       3       5
          B2       4       5       3       2       1
          C2       3       1       2       5       4
          A3       1       2       4       3       5
          B3       2       3       1       5       4
          C3       5       3       4       1       2
          A4       4       5       3       2       1
          B4       5       3       4       2       1
          C4       1       2       3       4       5
          A5       1       3       2       5       4
          B5       2       1       3       5       4
          C5       2       1       4       5       4
      """), sep=r'\s+')

    pm = ProjectMatcher.from_dataframe(df)

    pm.match()

输出:

$ python projectmatcher.py 
PROJECT ASSIGNMENTS:
    proj_A: [('A1', 1), ('A3', 1), ('C4', 1)]
    proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
    proj_C: [('A2', 1), ('A5', 2), ('B3', 1)]
    proj_D: [('B1', 2), ('C1', 1), ('C3', 1)]
    proj_E: [('A4', 1), ('B2', 1), ('B4', 1)]

结果始终是好的,可能有100次迭代,它默认为1K表示好的度量,这只需要眨眼的时间。不知何故,尽管部分基于偶然,它还是成功地制作出了高质量的配对

可以修改该程序以收集得分最低的所有唯一分发,并生成一个列表。每一项都有相同的最低分数,但分布不同:

PROJECT ASSIGNMENTS:
    option 1
        proj_A: [('A3', 1), ('A5', 1), ('C4', 1)]
        proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
        proj_C: [('A2', 1), ('B1', 1), ('B3', 1)]
        proj_D: [('B4', 2), ('C1', 1), ('C3', 1)]
        proj_E: [('A1', 2), ('A4', 1), ('B2', 1)]
    option 2
        proj_A: [('A3', 1), ('A5', 1), ('C4', 1)]
        proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
        proj_C: [('A2', 1), ('B1', 1), ('B3', 1)]
        proj_D: [('A4', 2), ('C1', 1), ('C3', 1)]
        proj_E: [('A1', 2), ('B2', 1), ('B4', 1)]
    option 3
        proj_A: [('A1', 1), ('A3', 1), ('C4', 1)]
        proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
        proj_C: [('A2', 1), ('A5', 2), ('B3', 1)]
        proj_D: [('B1', 2), ('C1', 1), ('C3', 1)]
        proj_E: [('A4', 1), ('B2', 1), ('B4', 1)]
    option 4
        proj_A: [('A3', 1), ('A5', 1), ('C4', 1)]
        proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
        proj_C: [('A2', 1), ('B1', 1), ('B3', 1)]
        proj_D: [('B2', 2), ('C1', 1), ('C3', 1)]
        proj_E: [('A1', 2), ('A4', 1), ('B4', 1)]

我想用遗传算法进行实验,这似乎是一个很好的优化类型的问题。有15行可以是任意顺序,一共有15行!排列,或1.0e+12。尝试所有排列的蛮力方法是不实际的

我有下面的函数来计算人口中个体的“适合度”。分数是平均值和标准差的组合。我的数学可能不完全正确,我肯定是用numpy来做的,但它似乎产生了很好的结果

def calculate_fitness(population):
    fitness_scores = []

    for individual in population:
        # Group the rows in 3's according to the columns.
        proj_a = individual[  : 3,1]  # First 3 rows, column 1.
        proj_b = individual[ 3: 6,2]  # Next  3 rows, column 2, etc.
        proj_c = individual[ 6: 9,3]
        proj_d = individual[ 9:12,4]
        proj_e = individual[12:15,5]  # Bottom 3 rows, last column.

        arr = np.array([proj_a, proj_b, proj_c, proj_d, proj_e])

        mean = arr.mean()          # Mean.
        std  = np.abs(arr.std())   # Standard deviation.

        # We want both the lowest mean and lowest standard deviation.
        # For simplicity, let's just add them and use that as the score.
        fitness_scores.append(mean + std)

    # Invert and scale the values so they can be used as weights
    # for random selection.
    fitness_scores  = np.array(fitness_scores)
    fitness_scores  = (fitness_scores.max() + .3 ) - fitness_scores
    fitness_scores /= (fitness_scores.max() + .07)
    fitness_scores *= 100

    return fitness_scores

输出-前3行属于A,后3行属于B,依此类推:

employee proj_A proj_B proj_C proj_D proj_E
      A3      1      2      4      3      5
      C4      1      2      3      4      5
      A1      1      5      3      4      2
      C2      3      1      2      5      4
      B5      2      1      3      5      4
      C5      2      1      4      5      4
      A2      4      2      1      3      5
      A5      1      3      2      5      4
      B3      2      3      1      5      4
      B1      5      4      1      2      3
      C3      5      3      4      1      2
      C1      2      3      4      1      5
      B2      4      5      3      2      1
      B4      5      3      4      2      1
      A4      4      5      3      2      1

在这一组中,似乎每个人都很高兴,这可能是最佳组合

在这里,每个人都非常满意所有的1分,除了A3得到3分

employee proj_A proj_B proj_C proj_D proj_E
      C4      1      _      _      _      _
      A1      1      _      _      _      _
      A5      1      _      _      _      _
      B5      _      1      _      _      _
      C2      _      1      _      _      _
      C5      _      1      _      _      _
      A2      _      _      1      _      _
      B3      _      _      1      _      _
      B1      _      _      1      _      _
      C1      _      _      _      1      _
      A3      _      _      _      3      _
      C3      _      _      _      1      _
      A4      _      _      _      _      1
      B4      _      _      _      _      1
      B2      _      _      _      _      1

我发现,调整高突变率,保护前5名个体不受突变和死亡的影响,大大提高了结果

父母是通过随机抽取4个人,将他们的健康分数作为权重来选择,以选择更健康的父母。然后,将4个中的前4个中的任何一个与不具有相同适应度分数的其他任何一个进行匹配,以尝试防止近亲繁殖,并将种群多样性保持在一个良好的范围内

每次迭代,一个个体死亡,两个父代被选择并产生一个子代,并且以50%的速率通过随机交换其两行来选择和变异一个个体

我发现最好的群体是150个成员,1k到2k的迭代似乎得到了一致的结果

相关问题 更多 >

    热门问题