我想在python中基于相似性度量(定义为节点之间的欧几里德距离)在网络中的节点之间创建链接。问题是代码只需要200秒就可以创建网络,当我调整我的模型时,代码执行至少100次,这段代码的长执行时间使得整个代码运行缓慢
因此,节点实际上是客户。我为这个类定义了一个类。它们有两个属性性别(数字;由数字0或1指定)和年龄(从24到44不等),存储在csv文件中。我在这里生成如下内容:
#number of customers
ncons = 5000
gender = [random.randint(0, 1) for i in range(ncons)]
age = [random.randint(22, 39) for i in range(ncons)]
customer_df = pd.DataFrame(
{'customer_gender': gender,
'customer_age': age
})
customer_df.to_csv('customer_df.csv', mode = 'w', index=False)
欧几里德距离delta_ik为following。在公式中, n
是属性数。这里的属性是性别和年龄。对于客户 i
和 k
, S_f,i - S_f,k
是属性 f = 1,2
之间的差异,该id除以所有客户(max d_f
)的属性 f
的最大范围。因此,距离是属性中的距离,而不是地理位置。
然后我定义相似性度量H_ik,它从delta_ik创建一个介于0和1之间的数字,如下所示:。最后,对于客户 i
和 k
,我生成一个介于0和1之间的随机数rho。如果rho比H_ik小,则节点是连接的
因此,将delta_ik保留在矩阵中,然后使用该矩阵生成网络的代码如下所示:
import random
import pandas as pd
import time
import csv
import networkx as nx
import numpy as np
import math
#Read the csv file containing the part worth utilities of 184 consumers
def readCSVPWU():
global headers
global Attr
Attr = []
with open('customer_df.csv') as csvfile:
csvreader = csv.reader(csvfile,delimiter=',')
headers = next(csvreader) # skip the first row of the CSV file.
#CSV header cells are string and should be turned to a float number.
for i in range(len(headers)):
if headers[i].isnumeric():
headers[i] = float(headers[i])
for row in csvreader:
AttrS = row
Attr.append(AttrS)
#convert strings to float numbers
Attr = [[float(j) for j in i] for i in Attr]
#Return the CSV as a matrix with 17 columns and 184 rows
return Attr
#customer class
class Customer:
def __init__(self, PWU = None, Ut = None):
self.Ut = Ut
self.PWU = Attr[random.randint(0,len(Attr)-1)] # Pick random row from survey utility data
#Generate a network by connecting nodes based on their similarity metric
def Network_generation(cust_agent):
start_time = time.time() # track execution time
#we form links/connections between consumeragentsbasedontheirdegreeofsocio-demographic similarity.
global ncons
Gcons = nx.Graph()
#add nodes
[Gcons.add_node(i, data = cust_agent[i]) for i in range(ncons)]
#**********Compute the node to node distance
#Initialize Deltaik with zero's
Deltaik = [[0 for xi in range(ncons)] for yi in range(ncons)]
#For each attribute, find the maximum range of that attribute; for instance max age diff = max age - min age = 53-32=21
maxdiff = []
allval = []
#the last two columns of Attr keep income and age data
#Make a 2D numpy array to slice the last 2 columns
np_Attr = np.array(Attr)
#Take the last two columns, income and age of the participants, respectively
socio = np_Attr[:, [len(Attr[0])-2, len(Attr[0])-1]]
#convert numpy array to a list of list
socio = socio.tolist()
#Max diff for each attribute
for f in range(len(socio[0])):
for node1 in Gcons.nodes():
#keep all values of an attribute to find the max range
allval.append((Gcons.nodes[node1]['data'].PWU[-2:][f]))
maxdiff.append((max(allval)-min(allval)))
allval = []
# THE SECOND MOST TIME CONSUMING PART ********************
for node1 in Gcons.nodes():
for node2 in Gcons.nodes():
tempdelta = 0
#for each feature (attribute)
for f in range(len(socio[0])):
Deltaik[node1][node2] = (Gcons.nodes[node1]['data'].PWU[-2:][f]-Gcons.nodes[node2]['data'].PWU[-2:][f])
#max difference
insidepar = (Deltaik[node1][node2] / maxdiff[f])**2
tempdelta += insidepar
Deltaik[node1][node2] = math.sqrt(tempdelta)
# THE END OF THE SECOND MOST TIME CONSUMING PART ********************
#Find maximum of a matrix
maxdel = max(map(max, Deltaik))
#Find the homopholic weight
import copy
Hik = copy.deepcopy(Deltaik)
for i in range(len(Deltaik)):
for j in range(len(Deltaik[0])):
Hik[i][j] =1 - (Deltaik[i][j]/maxdel)
#Define a dataframe to save Hik
dfHik = pd.DataFrame(columns = list(range(ncons) ),index = list(range(ncons) ))
temp_h = []
#For every consumer pair $i$ and $k$, a random number $\rho$ from a uniform distribution $U(0,1)$ is drawn and compared with $H_{i,k}$ . The two consumers are connected in the social network if $\rho$ is smaller than $H_{i,k}$~\cite{wolf2015changing}.
# THE MOST TIME CONSUMING PART ********************
for node1 in Gcons.nodes():
for node2 in Gcons.nodes():
#Add Hik to the dataframe
temp_h.append(Hik[node1][node2])
rho = np.random.uniform(0,1,1)
if node1 != node2:
if rho < Hik[node1][node2]:
Gcons.add_edge(node1, node2)
#Row idd for consumer idd keeps homophily with every other consumer
dfHik.loc[node1] = temp_h
temp_h = []
# nx.draw(Gcons, with_labels=True)
print("Simulation time: %.3f seconds" % (time.time() - start_time))
# THE END OF THE MOST TIME CONSUMING PART ********************
return Gcons
#%%
#number of customers
ncons = 5000
gender = [random.randint(0, 1) for i in range(ncons)]
age = [random.randint(22, 39) for i in range(ncons)]
customer_df = pd.DataFrame(
{'customer_gender': gender,
'customer_age': age
})
customer_df.to_csv('customer_df.csv', mode = 'w', index=False)
readCSVPWU()
customer_agent = dict(enumerate([Customer(PWU = [], Ut = []) for ij in range(ncons)])) # Ut=[]
G = Network_generation(customer_agent)
如果您能给我一些关于使用更多pythonic命令来减少运行时间的建议,我将不胜感激
多谢各位
目前没有回答
相关问题 更多 >
编程相关推荐