在Pandas中生成模拟数据

import datetime as dt import calendar import random import numpy as np import pandas as pd import uuid products = {'iPhone': [700, 10], 'Google Phone': [600, 8], 'Vareebadd Phone': [400, 3], '20in Monitor': [109.99,6], '34in Ultrawide Monitor': [379.99, 9], '27in 4K Gaming Monitor': [389.99,9], '27in FHD Monitor': [149.99, 11], 'Flatscreen TV': [300, 7], 'Macbook Pro Laptop': [1700, 7], 'ThinkPad Laptop': [999.99, 6], 'AA Batteries (4-pack)': [3.84, 30], 'AAA Batteries (4-pack)': [2.99, 30], 'USB-C Charging Cable': [11.95, 30], 'Lightning Charging Cable': [14.95, 30], 'Wired Headphones': [11.99, 26], 'Bose SoundSport Headphones': [99.99, 19], 'Apple Airpods Headphones': [150, 22], 'LG Washing Machine': [600.00, 1], 'LG Dryer': [600.00, 1]} columns = ['Order ID', 'Product', 'Quantity Ordered', 'Price Each', 'Order Date', 'Purchase Address'] df = pd.DataFrame(columns=columns) for i in range(999): products_list = [product for product in products] weights = [products[key][1] for key in products_list] product = random.choices(products_list, weights=weights)[0] price = products[product][0] df.loc[i] = [i, product, "NA" ,price, "NA", "NA"] df.groupby("Product").count()

Order ID Quantity Ordered Price Each Order Date Purchase Address Product 20in Monitor 30 30 30 30 30 27in 4K Gaming Monitor 38 38 38 38 38 27in FHD Monitor 49 49 49 49 49 34in Ultrawide Monitor 35 35 35 35 35 AA Batteries (4-pack) 114 114 114 114 114 AAA Batteries (4-pack) 111 111 111 111 111 Apple Airpods Headphones 81 81 81 81 81 Bose SoundSport Headphones 68 68 68 68 68 Flatscreen TV 23 23 23 23 23 Google Phone 41 41 41 41 41 LG Dryer 5 5 5 5 5 LG Washing Machine 6 6 6 6 6 Lightning Charging Cable 110 110 110 110 110 Macbook Pro Laptop 24 24 24 24 24 ThinkPad Laptop 17 17 17 17 17 USB-C Charging Cable 116 116 116 116 116 Vareebadd Phone 7 7 7 7 7 Wired Headphones 90 90 90 90 90 iPhone 34 34 34 34 34

1条回答

网友

1楼 · 发布于 2024-05-18 08:19:49

产品和重量清单采用以下代码编制：

products_list = [product for product in products]
weights = [products[key][1] for key in products_list]

第一行生成products中的键列表。它可以是list(products.keys())甚至list(products)。下一行创建权重列表。对列表进行迭代会按索引顺序生成值，因此weights[i]处的权重对应于product_list[i]处的乘积random.choices()使用对应关系来了解每个项目的重量。您可以打印product_list和weights以检查：

w = max(len(p) for p in products)
print(f"{'i':>2}  {'product':{w}} {'wt':2}")
print(f"{' '}  {'-'*w} {' '}")
for i, (product, weight) in enumerate(zip(products_list, weights)):
    print(f"{i:2}: {product:.<{w}} {weight:2}")

输出：

 i  product                    wt
                  
 0: iPhone.................... 10
 1: Google Phone..............  8
 2: Vareebadd Phone...........  3
 3: 20in Monitor..............  6
 4: 34in Ultrawide Monitor....  9
 5: 27in 4K Gaming Monitor....  9
 6: 27in FHD Monitor.......... 11
 7: Flatscreen TV.............  7
 8: Macbook Pro Laptop........  7
 9: ThinkPad Laptop...........  6
10: AA Batteries (4-pack)..... 30
11: AAA Batteries (4-pack).... 30
12: USB-C Charging Cable...... 30
13: Lightning Charging Cable.. 30
14: Wired Headphones.......... 26
15: Bose SoundSport Headphones 19
16: Apple Airpods Headphones.. 22
17: LG Washing Machine........  1
18: LG Dryer..................  1

注意：product_list和weights在循环中每次都是相同的。将这两条线移动到循环之前以提高效率random.choices()接受一个参数k，该参数指定要进行多少选择，以便可以删除循环。修订守则：

k = 999

product_list = list(products)
weights = [value[1] for value in products.values()]

random_products = random.choices(product_list, weights=weights, k=k)

price = [products[product][0] for product in random_products]

df = pd.DataFrame({'Order ID':list(range(k)),
                   'Product':random_products,
                   'Quantity Ordered':['NA']*k,
                   'Price Each':price,
                   'Order Date':['NA']*k,
                   'Purchase Address':['NA']*k,
                  })

df.groupby("Product").count()

替代取样方法

首先，将products转换为数据帧：

df = pd.DataFrame.from_dict(products,
                            orient="index",
                            columns=["price", "weight"])

然后使用DataFrame.sample()：

number_of_samples = 10  # 999
sample = df.sample(number_of_samples, replace=True, weights="weight")

将索引转换为列：

sample = sample.rename_axis('product').reset_index()

最后，获取订单计数：

sample.groupby('product').size()

示例输出：

product
AA Batteries (4-pack)         3
AAA Batteries (4-pack)        1
Bose SoundSport Headphones    1
Flatscreen TV                 1
Google Phone                  1
Lightning Charging Cable      2
USB-C Charging Cable          1
dtype: int64

替代取样方法

相关问题更多 >

编程相关推荐

热门问题

热门文章