将具有排序唯一值的嵌套数据帧转换为Python中的嵌套字典问题的回答

将具有排序唯一值的嵌套数据帧转换为Python中的嵌套字典

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在尝试获取嵌套的数据帧并将其转换为嵌套的字典。你知道吗 这是我的原始数据帧，具有以下唯一值： 输入：<code>df.head(5)</code> 输出： <pre><code> reviewerName title reviewerRatings 0 Charles Harry Potter Book Seven News:... 3.0 1 Katherine Harry Potter Boxed Set, Books... 5.0 2 Lora Harry Potter and the Sorcerer... 5.0 3 Cait Harry Potter and the Half-Blo... 5.0 4 Diane Harry Potter and the Order of... 5.0 </code></pre> 输入：<code>len(df['reviewerName'].unique())</code> 输出：<code>66130</code> 考虑到66130 unqiue值中的每个值都有多个值（即“Charles”将出现3次），我将66130唯一的“reviewerName”赋值为新嵌套数据帧中的键，然后使用“title”和“reviewerRatings”作为另一层属性来指定值关键字：值相同的嵌套数据帧。你知道吗 输入：<code>df = df.set_index(['reviewerName', 'title']).sort_index()</code> 输出： <pre><code> reviewerRatings reviewerName title Charles Harry Potter Book Seven News:... 3.0 Harry Potter and the Half-Blo... 3.5 Harry Potter and the Order of... 4.0 Katherine Harry Potter Boxed Set, Books... 5.0 Harry Potter and the Half-Blo... 2.5 Harry Potter and the Order of... 5.0 ... 230898 rows x 1 columns </code></pre> 作为后续行动 <a href="https://stackoverflow.com/questions/54209548/filter-all-unique-items-in-column1-as-a-key-along-with-column2-and-column3-as-k?noredirect=1#comment95248962_54209548">first question</a>，我试图将嵌套的数据帧转换为嵌套的字典。你知道吗 上面新的嵌套DataFrame列索引在第一行（第3列）显示“reviewerRatings”，在第二行（第1列和第2列）显示“reviewerName”和“title”，当我运行下面的<code>df.to_dict()</code>方法时，输出显示<code>{reviewerRatingsIndexName: {(reviewerName, title): reviewerRatings}}</code> 输入：<code>df.to_dict()</code> 输出： <pre><code>{'reviewerRatings': { ('Charles', 'Harry Potter Book Seven News:...'): 3.0, ('Charles', 'Harry Potter and the Half-Blo...'): 3.5, ('Charles', 'Harry Potter and the Order of...'): 4.0, ('Katherine', 'Harry Potter Boxed Set, Books...'): 5.0, ('Katherine', 'Harry Potter and the Half-Blo...'): 2.5, ('Katherine', 'Harry Potter and the Order of...'): 5.0, ...} } </code></pre> 但是对于下面我想要的输出，我希望以<code>{reviewerName: {title: reviewerRating}}</code>的形式获得我的输出，这正是我在嵌套数据帧中的排序方式。你知道吗 <pre><code>{'Charles': {'Harry Potter Book Seven News:...': 3.0, 'Harry Potter and the Half-Blo...': 3.5, 'Harry Potter and the Order of...': 4.0}, 'Katherine': {'Harry Potter Boxed Set, Books...': 5.0, 'Harry Potter and the Half-Blo...': 2.5, 'Harry Potter and the Order of...': 5.0}, ...} </code></pre> 是否有任何方法可以操作嵌套的数据帧或嵌套的字典，以便在运行<code>df.to_dict()</code>方法时，它将显示<code>{reviewerName: {title: reviewerRating}}</code>。你知道吗 谢谢！你知道吗

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

有几种方法。可以将<code>groupby</code>与<code>to_dict</code>一起使用，也可以使用<code>collections.defaultdict</code>迭代行。值得注意的是，后者并不一定效率较低。你知道吗 <h3><a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html" rel="nofollow noreferrer">^{<cd1>}</a>+<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.to_dict.html" rel="nofollow noreferrer">^{<cd2>}</a></h3> 从每个<code>groupby</code>对象构造一个序列，并将其转换为字典以给出一系列字典值。最后，通过另一个<code>to_dict</code>调用将其转换为字典。你知道吗 <pre><code>res = df.groupby('reviewerName')\ .apply(lambda x: x.set_index('title')['reviewerRatings'].to_dict())\ .to_dict() </code></pre> <h3><a href="https://docs.python.org/3/library/collections.html#collections.defaultdict" rel="nofollow noreferrer">^{<cd3>}</a></h3> 定义<code>defaultdict</code>个<code>dict</code>对象并逐行迭代数据帧。你知道吗 <pre><code>from collections import defaultdict res = defaultdict(dict) for row in df.itertuples(index=False): res[row.reviewerName][row.title] = row.reviewerRatings </code></pre> 结果<code>defaultdict</code>不需要转换回常规<code>dict</code>，因为<code>defaultdict</code>是<code>dict</code>的子类。你知道吗 <h3>绩效基准</h3> 基准测试是建立和数据相关的。你应该用你自己的数据来测试，看看什么最有效。你知道吗 <pre><code># Python 3.6.5, Pandas 0.19.2 from collections import defaultdict from random import sample # construct sample dataframe np.random.seed(0) n = 10**4 # number of rows names = np.random.choice(['Charles', 'Lora', 'Katherine', 'Matthew', 'Mark', 'Luke', 'John'], n) books = [f'Book_{i}' for i in sample(range(10**5), n)] ratings = np.random.randint(0, 6, n) df = pd.DataFrame({'reviewerName': names, 'title': books, 'reviewerRatings': ratings}) def jez(df): return df.groupby('reviewerName')['title','reviewerRatings']\ .apply(lambda x: dict(x.values))\ .to_dict() def jpp1(df): return df.groupby('reviewerName')\ .apply(lambda x: x.set_index('title')['reviewerRatings'].to_dict())\ .to_dict() def jpp2(df): dd = defaultdict(dict) for row in df.itertuples(index=False): dd[row.reviewerName][row.title] = row.reviewerRatings return dd %timeit jez(df) # 33.5 ms per loop %timeit jpp1(df) # 17 ms per loop %timeit jpp2(df) # 21.1 ms per loop </code></pre>

将具有排序唯一值的嵌套数据帧转换为Python中的嵌套字典

1 个回答

相关Python问题