在Pandas中生成模拟数据

2024-05-18 08:19:49 发布

您现在位置:Python中文网/ 问答频道 /正文

当我生成模拟以及根据权重定义订单数量时。然后,我将权重传递到random.choices()中,它以某种方式识别出哪个产品具有特定的权重,但不知道它是如何识别的。顺便说一句,dict值列表中的第二个值对应于权重

import datetime as dt
import calendar
import random
import numpy as np
import pandas as pd
import uuid

products = {'iPhone': [700, 10],
  'Google Phone': [600, 8],
  'Vareebadd Phone': [400, 3],
  '20in Monitor': [109.99,6],
  '34in Ultrawide Monitor': [379.99, 9],
  '27in 4K Gaming Monitor': [389.99,9],
  '27in FHD Monitor': [149.99, 11],
  'Flatscreen TV': [300, 7],
  'Macbook Pro Laptop': [1700, 7],
  'ThinkPad Laptop': [999.99, 6],
  'AA Batteries (4-pack)': [3.84, 30],
  'AAA Batteries (4-pack)': [2.99, 30],
  'USB-C Charging Cable': [11.95, 30],
  'Lightning Charging Cable': [14.95, 30],
  'Wired Headphones': [11.99, 26],
  'Bose SoundSport Headphones': [99.99, 19],
  'Apple Airpods Headphones': [150, 22],
  'LG Washing Machine': [600.00, 1],
  'LG Dryer': [600.00, 1]}


columns = ['Order ID', 'Product', 'Quantity Ordered', 'Price Each', 'Order Date', 'Purchase Address']

df = pd.DataFrame(columns=columns)

for i in range(999):
  products_list = [product for product in products]
  weights = [products[key][1] for key in products_list]
  
  product = random.choices(products_list, weights=weights)[0]
  price = products[product][0]


  df.loc[i] = [i, product, "NA" ,price, "NA", "NA"]

df.groupby("Product").count()

这就是我得到的结果:

        Order ID    Quantity Ordered    Price Each  Order Date  Purchase Address
Product                 
20in Monitor    30  30  30  30  30
27in 4K Gaming Monitor  38  38  38  38  38
27in FHD Monitor    49  49  49  49  49
34in Ultrawide Monitor  35  35  35  35  35
AA Batteries (4-pack)   114 114 114 114 114
AAA Batteries (4-pack)  111 111 111 111 111
Apple Airpods Headphones    81  81  81  81  81
Bose SoundSport Headphones  68  68  68  68  68
Flatscreen TV   23  23  23  23  23
Google Phone    41  41  41  41  41
LG Dryer    5   5   5   5   5
LG Washing Machine  6   6   6   6   6
Lightning Charging Cable    110 110 110 110 110
Macbook Pro Laptop  24  24  24  24  24
ThinkPad Laptop 17  17  17  17  17
USB-C Charging Cable    116 116 116 116 116
Vareebadd Phone 7   7   7   7   7
Wired Headphones    90  90  90  90  90
iPhone  34  34  34  34  34

Tags: importphoneorderrandomproductpackmonitorproducts
1条回答
网友
1楼 · 发布于 2024-05-18 08:19:49

产品和重量清单采用以下代码编制:

products_list = [product for product in products]
weights = [products[key][1] for key in products_list]

第一行生成products中的键列表。它可以是list(products.keys())甚至list(products)。下一行创建权重列表。对列表进行迭代会按索引顺序生成值,因此weights[i]处的权重对应于product_list[i]处的乘积random.choices()使用对应关系来了解每个项目的重量。您可以打印product_listweights以检查:

w = max(len(p) for p in products)
print(f"{'i':>2}  {'product':{w}} {'wt':2}")
print(f"{' '}  {'-'*w} {' '}")
for i, (product, weight) in enumerate(zip(products_list, weights)):
    print(f"{i:2}: {product:.<{w}} {weight:2}")

输出:

 i  product                    wt
                  
 0: iPhone.................... 10
 1: Google Phone..............  8
 2: Vareebadd Phone...........  3
 3: 20in Monitor..............  6
 4: 34in Ultrawide Monitor....  9
 5: 27in 4K Gaming Monitor....  9
 6: 27in FHD Monitor.......... 11
 7: Flatscreen TV.............  7
 8: Macbook Pro Laptop........  7
 9: ThinkPad Laptop...........  6
10: AA Batteries (4-pack)..... 30
11: AAA Batteries (4-pack).... 30
12: USB-C Charging Cable...... 30
13: Lightning Charging Cable.. 30
14: Wired Headphones.......... 26
15: Bose SoundSport Headphones 19
16: Apple Airpods Headphones.. 22
17: LG Washing Machine........  1
18: LG Dryer..................  1

注意:product_listweights在循环中每次都是相同的。将这两条线移动到循环之前以提高效率random.choices()接受一个参数k,该参数指定要进行多少选择,以便可以删除循环。修订守则:

k = 999

product_list = list(products)
weights = [value[1] for value in products.values()]

random_products = random.choices(product_list, weights=weights, k=k)

price = [products[product][0] for product in random_products]

df = pd.DataFrame({'Order ID':list(range(k)),
                   'Product':random_products,
                   'Quantity Ordered':['NA']*k,
                   'Price Each':price,
                   'Order Date':['NA']*k,
                   'Purchase Address':['NA']*k,
                  })

df.groupby("Product").count()

替代取样方法

首先,将products转换为数据帧:

df = pd.DataFrame.from_dict(products,
                            orient="index",
                            columns=["price", "weight"])

然后使用DataFrame.sample()

number_of_samples = 10  # 999
sample = df.sample(number_of_samples, replace=True, weights="weight")

将索引转换为列:

sample = sample.rename_axis('product').reset_index()

最后,获取订单计数:

sample.groupby('product').size()

示例输出:

product
AA Batteries (4-pack)         3
AAA Batteries (4-pack)        1
Bose SoundSport Headphones    1
Flatscreen TV                 1
Google Phone                  1
Lightning Charging Cable      2
USB-C Charging Cable          1
dtype: int64

相关问题 更多 >