<p>我试图通过以下方法部分解决这个问题<code>DataFrame</code></p>
<p>其思想是使用<code>recursion</code>并将字典扁平化为<code>res</code></p>
<p>然后使用5的滑动窗口连续提取5个元素-<code>divide_chunks_sliding</code></p>
<pre><code>d = {108: {'Wallmart': {'ca': {'good': 'busy'}}},
204: {'Wallmart': {'ny': {'good': 'busy'}}},
205: {'Wallmart': {'ny': {'great': 'busy'}}},
110: {'CVS': {'ny': {'great': 'busy'}}},
184: {'Wallmart': {'fl': {'great': 'busy'}}},
185: {'Wallmart': {'fl': {'bad': 'busy'}}},
105: {'Wallmart': {'ga': {'bad': 'busy'}}},
497: {'Wallmart': {'ga': {'bad': 'busy'}}},
400: {'RiteAid': {'dc': {'good': 'busy'}}},
406: {'RidaAid': {'dc': {'geat': 'busy'}}},
367: {'Other': {'tx': {'bad': 'busy'}}}}
def recur_dict(inp,res=[]):
for x in inp:
if isinstance(inp[x],dict):
res += [x]
recur_dict(inp[x],res)
else:
res += [x]
res += [inp[x]]
return res
def divide_chunks_sliding(in_arr,chunk):
n = len(in_arr)
i = 0
while i < n:
i += chunk
yield in_arr[i-chunk:i]
##### Divide Chunks Usage Example #####
>>> print(list(divide_chunks_sliding([1,2,3,4,5,6,7,8],2)))
[[1, 2], [3, 4], [5, 6], [7, 8]]
</code></pre>
<p>利用上述功能,创建<code>df</code>并与<code>store</code>上的self合并</p>
<pre><code>res = recur_dict(d)
values = list(divide_chunks_sliding(res,5))
df = pd.DataFrame(data=values,columns=['Key','Brand','store','review','flag'])
df_merge = pd.merge(df,df[['store','Brand']],on=['store'],suffixes=['_Self_Left','_Self_Right'])
>>> print(df_merge[df_merge['Brand_Self_Left'] != df_merge['Brand_Self_Right']])
Key Brand_Self_Left store review flag Brand_Self_Right
3 204 Wallmart ny good busy CVS
6 205 Wallmart ny great busy CVS
7 110 CVS ny great busy Wallmart
8 110 CVS ny great busy Wallmart
19 400 RiteAid dc good busy RidaAid
20 406 RidaAid dc geat busy RiteAid
</code></pre>
<p><code>df_merge</code>将包含具有相同<code>store</code>和不同<code>Brand</code>的所有行,但是将其转换回原始结构仍处于挂起状态</p>