将具有排序唯一值的嵌套数据帧转换为Python中的嵌套字典

reviewerName title reviewerRatings 0 Charles Harry Potter Book Seven News:... 3.0 1 Katherine Harry Potter Boxed Set, Books... 5.0 2 Lora Harry Potter and the Sorcerer... 5.0 3 Cait Harry Potter and the Half-Blo... 5.0 4 Diane Harry Potter and the Order of... 5.0

reviewerRatings reviewerName title Charles Harry Potter Book Seven News:... 3.0 Harry Potter and the Half-Blo... 3.5 Harry Potter and the Order of... 4.0 Katherine Harry Potter Boxed Set, Books... 5.0 Harry Potter and the Half-Blo... 2.5 Harry Potter and the Order of... 5.0 ... 230898 rows x 1 columns

{'reviewerRatings': { ('Charles', 'Harry Potter Book Seven News:...'): 3.0, ('Charles', 'Harry Potter and the Half-Blo...'): 3.5, ('Charles', 'Harry Potter and the Order of...'): 4.0, ('Katherine', 'Harry Potter Boxed Set, Books...'): 5.0, ('Katherine', 'Harry Potter and the Half-Blo...'): 2.5, ('Katherine', 'Harry Potter and the Order of...'): 5.0, ...} }

{'Charles': {'Harry Potter Book Seven News:...': 3.0, 'Harry Potter and the Half-Blo...': 3.5, 'Harry Potter and the Order of...': 4.0}, 'Katherine': {'Harry Potter Boxed Set, Books...': 5.0, 'Harry Potter and the Half-Blo...': 2.5, 'Harry Potter and the Order of...': 5.0}, ...}

2条回答

网友
1楼 · 编辑于 2024-10-01 09:31:30

有几种方法。可以将groupby与to_dict一起使用，也可以使用collections.defaultdict迭代行。值得注意的是，后者并不一定效率较低。你知道吗
^{}+^{}
从每个groupby对象构造一个序列，并将其转换为字典以给出一系列字典值。最后，通过另一个to_dict调用将其转换为字典。你知道吗
res = df.groupby('reviewerName')\ .apply(lambda x: x.set_index('title')['reviewerRatings'].to_dict())\ .to_dict()
^{}
定义defaultdict个dict对象并逐行迭代数据帧。你知道吗
from collections import defaultdict res = defaultdict(dict) for row in df.itertuples(index=False): res[row.reviewerName][row.title] = row.reviewerRatings
结果defaultdict不需要转换回常规dict，因为defaultdict是dict的子类。你知道吗
绩效基准
基准测试是建立和数据相关的。你应该用你自己的数据来测试，看看什么最有效。你知道吗
# Python 3.6.5, Pandas 0.19.2 from collections import defaultdict from random import sample # construct sample dataframe np.random.seed(0) n = 10**4 # number of rows names = np.random.choice(['Charles', 'Lora', 'Katherine', 'Matthew', 'Mark', 'Luke', 'John'], n) books = [f'Book_{i}' for i in sample(range(10**5), n)] ratings = np.random.randint(0, 6, n) df = pd.DataFrame({'reviewerName': names, 'title': books, 'reviewerRatings': ratings}) def jez(df): return df.groupby('reviewerName')['title','reviewerRatings']\ .apply(lambda x: dict(x.values))\ .to_dict() def jpp1(df): return df.groupby('reviewerName')\ .apply(lambda x: x.set_index('title')['reviewerRatings'].to_dict())\ .to_dict() def jpp2(df): dd = defaultdict(dict) for row in df.itertuples(index=False): dd[row.reviewerName][row.title] = row.reviewerRatings return dd %timeit jez(df) # 33.5 ms per loop %timeit jpp1(df) # 17 ms per loop %timeit jpp2(df) # 21.1 ms per loop

网友
2楼 · 编辑于 2024-10-01 09:31:30

对每个reviewerName的dictionaries使用^{}和lambda函数，然后输出Seriesconvert by ^{}：
print (df) reviewerName title reviewerRatings 0 Charles Harry Potter Book Seven News:... 3.0 1 Charles Harry Potter Boxed Set, Books... 5.0 2 Charles Harry Potter and the Sorcerer... 5.0 3 Katherine Harry Potter and the Half-Blo... 5.0 4 Katherine Harry otter and the Order of... 5.0
d = (df.groupby('reviewerName')['title','reviewerRatings'] .apply(lambda x: dict(x.values)) .to_dict()) print (d) { 'Charles': { 'Harry Potter Book Seven News:...': 3.0, 'Harry Potter Boxed Set, Books...': 5.0, 'Harry Potter and the Sorcerer...': 5.0 }, 'Katherine': { 'Harry Potter and the Half-Blo...': 5.0, 'Harry otter and the Order of...': 5.0 } }

^{}+^{}

^{}

绩效基准

相关问题更多 >

编程相关推荐

热门问题

热门文章