我试图生成partners
的随机样本,在100次迭代中没有重复的位置。合作伙伴及其所在地位于df
。在每次迭代结束时,我想知道随机分配的每个合作伙伴的份额和评级,以及之前的数据old_df
# Data
import pandas as pd
old_df = pd.DataFrame({'location': ['Hyderabad', 'Assam', 'Kolkata'],
'partner':['x','y','z'],
'share':[0.1,0.4,0.2],
'ratings':[20,20,10]})
df = pd.DataFrame({'location': ['Bangalore', 'Bangalore', 'Mumbai','Mumbai','Mumbai','Pune','Pune','Pune','Chennai','Chennai'],
'partner':['x','y','z','y','z','x','y','z','z','x'],
'share':[0.1,0.1,0.4,0.4,0.4,0.2,0.2,0.2,0.1,0.1],
'ratings':[20,10,10,20,30,20,20,20,10,20]})
# Simulation
simulations = 100
all_stats = []
start_time = time.time()
for num in range(simulations):
random_sample = df.sample(frac = 1.0).groupby('location').head(1)
random_sample = old_df.append(random_sample[['location','partner','share','ratings']])
condn = random_sample.groupby(['partner']).sum().reset_index()
condn = condn[['partner','share','ratings']]
all_stats.append([num,
condn.share[0].round(2),
condn.share[1].round(2),
condn.share[2].round(2),
random_sample['ratings'].sum().round(0)])
all_stats
print("--- %s seconds ---" % (round(time.time() - start_time,3)))
100次迭代需要1.4seconds
。当我放大(更多数据)时,我想运行更多的迭代,有没有更快的方法来实现这一点
目前没有回答
相关问题 更多 >
编程相关推荐