Python中的代码效率问题基于节点之间的欧几里德距离生成网络时,如何减少运行时间

2024-09-30 04:40:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我想在python中基于相似性度量(定义为节点之间的欧几里德距离)在网络中的节点之间创建链接。问题是代码只需要200秒就可以创建网络,当我调整我的模型时,代码执行至少100次,这段代码的长执行时间使得整个代码运行缓慢

因此,节点实际上是客户。我为这个类定义了一个类。它们有两个属性性别(数字;由数字0或1指定)和年龄(从24到44不等),存储在csv文件中。我在这里生成如下内容:

#number of customers
ncons = 5000
gender = [random.randint(0, 1) for i in range(ncons)]
age = [random.randint(22, 39) for i in range(ncons)]
customer_df = pd.DataFrame(
    {'customer_gender': gender,
     'customer_age': age
    })
customer_df.to_csv('customer_df.csv', mode = 'w', index=False)

欧几里德距离delta_ik为enter image description herefollowing。在公式中, n是属性数。这里的属性是性别和年龄。对于客户 i k S_f,i - S_f,k是属性 f = 1,2之间的差异,该id除以所有客户(max d_f)的属性 f的最大范围。因此,距离是属性中的距离,而不是地理位置。 然后我定义相似性度量H_ik,它从delta_ik创建一个介于0和1之间的数字,如下所示:customer similarity。最后,对于客户 i k,我生成一个介于0和1之间的随机数rho。如果rho比H_ik小,则节点是连接的

因此,将delta_ik保留在矩阵中,然后使用该矩阵生成网络的代码如下所示:

import random
import pandas as pd
import time
import csv
import networkx as nx
import numpy as np
import math
#Read the csv file containing the part worth utilities of 184 consumers
def readCSVPWU():
    global headers
    global Attr
    Attr = []
    with open('customer_df.csv') as csvfile:
        csvreader = csv.reader(csvfile,delimiter=',')
        headers = next(csvreader)  # skip the first row of the CSV file.
        #CSV header cells are string and should be turned to a float number.
        for i in range(len(headers)):   
            if headers[i].isnumeric():
                headers[i] = float(headers[i])
        for row in csvreader:
            AttrS = row
            Attr.append(AttrS)
    #convert strings to float numbers
    Attr = [[float(j) for j in i] for i in Attr]
    #Return the CSV as a matrix with 17 columns and 184 rows 
    return Attr

#customer class
class Customer:
    def __init__(self, PWU = None, Ut = None):
        self.Ut = Ut
        self.PWU = Attr[random.randint(0,len(Attr)-1)]  # Pick random row from survey utility data  


#Generate a network by connecting nodes based on their similarity metric
def Network_generation(cust_agent):
    start_time = time.time() # track execution time

    #we form links/connections between consumeragentsbasedontheirdegreeofsocio-demographic similarity.
    global ncons
    Gcons = nx.Graph()
    #add nodes
    [Gcons.add_node(i, data = cust_agent[i]) for i in range(ncons)]
    #**********Compute the node to node distance
    #Initialize Deltaik with zero's
    Deltaik = [[0 for xi in range(ncons)] for yi in range(ncons)] 
    #For each attribute, find the maximum range of that attribute; for instance max age diff = max age - min age = 53-32=21
    maxdiff = []
    allval = []
    #the last two columns of Attr keep income and age data
    #Make a 2D numpy array to slice the last 2 columns
    np_Attr = np.array(Attr)
    #Take the last two columns, income and age of the participants, respectively
    socio = np_Attr[:, [len(Attr[0])-2, len(Attr[0])-1]]
    #convert numpy array to a list of list
    socio = socio.tolist()
    #Max diff for each attribute

    for f in range(len(socio[0])):
        for node1 in Gcons.nodes():
        #keep all values of an attribute to find the max range
            allval.append((Gcons.nodes[node1]['data'].PWU[-2:][f]))
        maxdiff.append((max(allval)-min(allval)))
        allval = []
# THE SECOND MOST TIME CONSUMING PART ********************

    for node1 in Gcons.nodes():
        for node2 in Gcons.nodes():
            tempdelta = 0
            #for each feature (attribute)
            for f in range(len(socio[0])):
                Deltaik[node1][node2] = (Gcons.nodes[node1]['data'].PWU[-2:][f]-Gcons.nodes[node2]['data'].PWU[-2:][f])
                #max difference
                insidepar = (Deltaik[node1][node2] / maxdiff[f])**2
                tempdelta += insidepar
            Deltaik[node1][node2] = math.sqrt(tempdelta)
     # THE END OF THE SECOND MOST TIME CONSUMING PART ********************
       
    #Find maximum of a matrix
    maxdel = max(map(max, Deltaik))
    #Find the homopholic weight
    import copy
    Hik = copy.deepcopy(Deltaik)
    for i in range(len(Deltaik)):
        for j in range(len(Deltaik[0])):
            
            Hik[i][j] =1 - (Deltaik[i][j]/maxdel)
    #Define a dataframe to save Hik
    dfHik = pd.DataFrame(columns = list(range(ncons) ),index = list(range(ncons) ))
    temp_h = []
    #For every consumer pair $i$ and $k$, a random number $\rho$ from a uniform distribution $U(0,1)$ is drawn and compared with $H_{i,k}$ . The two consumers are connected in the social network if $\rho$ is smaller than $H_{i,k}$~\cite{wolf2015changing}.
# THE MOST TIME CONSUMING PART ********************
    for node1 in Gcons.nodes():
        for node2 in Gcons.nodes():
            #Add Hik to the dataframe
            temp_h.append(Hik[node1][node2])
            rho = np.random.uniform(0,1,1)
            if node1 != node2:
                if rho < Hik[node1][node2]:
                    Gcons.add_edge(node1, node2)
        #Row idd for consumer idd keeps homophily with every other consumer
        dfHik.loc[node1] = temp_h
        temp_h = []
    # nx.draw(Gcons, with_labels=True)            
    print("Simulation time: %.3f seconds" % (time.time() - start_time))
# THE END OF THE MOST TIME CONSUMING PART ********************

    return Gcons     
#%%
#number of customers
ncons = 5000
gender = [random.randint(0, 1) for i in range(ncons)]
age = [random.randint(22, 39) for i in range(ncons)]
customer_df = pd.DataFrame(
    {'customer_gender': gender,
     'customer_age': age
    })
customer_df.to_csv('customer_df.csv', mode = 'w', index=False)
readCSVPWU()
customer_agent = dict(enumerate([Customer(PWU = [], Ut = []) for ij in range(ncons)])) # Ut=[]
G = Network_generation(customer_agent)

如果您能给我一些关于使用更多pythonic命令来减少运行时间的建议,我将不胜感激

多谢各位


Tags: ofthetoinforagerangecustomer

热门问题