回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>以下是指向原始数据集源的链接:
<a href="http://data.bayanat.ae/en_GB/dataset/mosques-by-capacity-and-emirate/resource/5cc69203-3419-429e-b917-7dc13e917419" rel="nofollow noreferrer">dataset for capacity</a>和<a href="http://data.bayanat.ae/en_GB/dataset/mosques-by-type-and-emirate/resource/5fa5479f-4580-4e1f-b343-73a8e1c270c0" rel="nofollow noreferrer">dataset for type</a></p>
<p>或修改版本<a href="https://justpaste.it/edit/26958949/a89cf081da03ba29" rel="nofollow noreferrer">dataset modified1</a>和<a href="https://justpaste.it/edit/26970805/9d75c79a18f2e497" rel="nofollow noreferrer">dataset modified2</a></p>
<p>我有两个数据帧要合并:</p>
<pre><code> first_df=pd.DataFrame([['2001','Abu Dhabi','100-','462'],['2001','Abu Dhabi','100','44'],['2001','Abu Dhabi','200','462'],['2001','Dubai','100-','40'],['2001','Dubai','100','30'],['2001','Dubai','200','51'],['2002','Abu Dhabi','100-','300'],['2002','Abu Dhabi','100','220'],['2002','Abu Dhabi','200','56'],['2002','Dubai','100-','55'],['2002','Dubai','100','67'],['2002','Dubai','200','89']],columns=['Year','Emirate','Capacity','Number'])
second_df=pd.DataFrame([['2001','Abu Dhabi','Performed','45'],['2001','Abu Dhabi','Not Performed','76'],['2001','Dubai','Performed','90'],['2001','Dubai','Not Performed','50'],['2002','Abu Dhabi','Performed','78'],['2002','Abu Dhabi','Not Performed','45'],['2002','Dubai','Performed','76'],['2002','Dubai','Not Performed','58']],columns=['Year','Emirate','Type','Value'])
</code></pre>
<p>所以我为两个数据帧设置了多索引:</p>
<pre><code>first=first_df.set_index(['Year','Emirate'])
second=second_df.set_index(['Year','Emirate'])
</code></pre>
<p>合并后:</p>
<pre><code>merged=first.merge(second,how='outer',right_index=True,left_index=True)
</code></pre>
<p>结果如下:</p>
<h2>合并</h2>
<p><code>| Year , Emirate | Capacity | count | friday | count |
|:----------------------|:-----------|--------:|:--------------|--------:|
| ('2001', 'Abu Dhabi') | 100- | 462 | Performed | 45 |
| ('2001', 'Abu Dhabi') | 100- | 462 | Not Performed | 76 |
| ('2001', 'Abu Dhabi') | 100 | 44 | Performed | 45 |
| ('2001', 'Abu Dhabi') | 100 | 44 | Not Performed | 76 |
| ('2001', 'Abu Dhabi') | 200 | 657 | Performed | 45 |
| ('2001', 'Abu Dhabi') | 200 | 657 | Not Performed | 76 |
| ('2001', 'Dubai') | 100- | 40 | Performed | 90 |
| ('2001', 'Dubai') | 100- | 40 | Not Performed | 50 |
| ('2001', 'Dubai') | 100 | 30 | Performed | 90 |
| ('2001', 'Dubai') | 100 | 30 | Not Performed | 50 |
| ('2001', 'Dubai') | 200 | 51 | Performed | 90 |
| ('2001', 'Dubai') | 200 | 51 | Not Performed | 50 |
| ('2002', 'Abu Dhabi') | 100- | 300 | Performed | 78 |
| ('2002', 'Abu Dhabi') | 100- | 300 | Not Performed | 45 |
| ('2002', 'Abu Dhabi') | 100 | 220 | Performed | 78 |
| ('2002', 'Abu Dhabi') | 100 | 220 | Not Performed | 45 |
| ('2002', 'Abu Dhabi') | 200 | 56 | Performed | 78 |
| ('2002', 'Abu Dhabi') | 200 | 56 | Not Performed | 45 |
| ('2002', 'Dubai') | 100- | 55 | Performed | 76 |
| ('2002', 'Dubai') | 100- | 55 | Not Performed | 58 |
| ('2002', 'Dubai') | 100 | 67 | Performed | 76 |
| ('2002', 'Dubai') | 100 | 67 | Not Performed | 58 |
| ('2002', 'Dubai') | 200 | 89 | Performed | 76 |
| ('2002', 'Dubai') | 200 | 89 | Not Performed | 58 |</code></p>
<p>并试图得出以下结果:</p>
<pre><code>joined=pd.concat([first,second])
</code></pre>
<h2>加入</h2>
<p><code>| Year , Emirate | Capacity | Number | Type | Value |
|:----------------------|:-----------|---------:|:--------------|--------:|
| ('2001', 'Abu Dhabi') | 100- | 462 | nan | nan |
| ('2001', 'Abu Dhabi') | 100 | 44 | nan | nan |
| ('2001', 'Abu Dhabi') | 200 | 657 | nan | nan |
| ('2001', 'Dubai') | 100- | 40 | nan | nan |
| ('2001', 'Dubai') | 100 | 30 | nan | nan |
| ('2001', 'Dubai') | 200 | 51 | nan | nan |
| ('2002', 'Abu Dhabi') | 100- | 300 | nan | nan |
| ('2002', 'Abu Dhabi') | 100 | 220 | nan | nan |
| ('2002', 'Abu Dhabi') | 200 | 56 | nan | nan |
| ('2002', 'Dubai') | 100- | 55 | nan | nan |
| ('2002', 'Dubai') | 100 | 67 | nan | nan |
| ('2002', 'Dubai') | 200 | 89 | nan | nan |
| ('2001', 'Abu Dhabi') | nan | nan | Performed | 45 |
| ('2001', 'Abu Dhabi') | nan | nan | Not Performed | 76 |
| ('2001', 'Dubai') | nan | nan | Performed | 90 |
| ('2001', 'Dubai') | nan | nan | Not Performed | 50 |
| ('2002', 'Abu Dhabi') | nan | nan | Performed | 78 |
| ('2002', 'Abu Dhabi') | nan | nan | Not Performed | 45 |
| ('2002', 'Dubai') | nan | nan | Performed | 76 |
| ('2002', 'Dubai') | nan | nan | Not Performed | 58 |</code></p>
<p>所以连接在一起的两个数据帧不应该有重复(比如第一次合并)或者下移(比如concat变量)。
有什么解决方案可以使两个数据帧很好地对齐?你知道吗</p>
<p>下面是所需输出的样子:</p>
<p><code>| | Year | Emirate | Capacity | Number | Type | Value |
|---:|-------:|:----------|:-----------|---------:|:--------------|--------:|
| 0 | | | 100- | 462 | Performed | 45 |
| 1 | | Abu Dhabi | 100 | 44 | Not Performed | 76 |
| 2 | | | 200 | 657 | NaN | nan |
| 3 | 2001 | | 100- | 40 | Performed | 90 |
| 4 | | Dubai | 100 | 30 | Not Performed | 50 |
| 5 | | | 200 | 51 | NaN | nan |
| 6 | | | 100- | 300 | Performed | 78 |
| 7 | | Abu Dhabi | 100 | 220 | Not Performed | 45 |
| 8 | 2002 | | 200 | 56 | NaN | nan |
| 9 | | | 100- | 55 | Performed | 76 |
| 10 | | Dubai | 100 | 67 | Not Performed | 58 |
| 11 | | | 200 | 89 | NaN | nan |</code></p>
<pre><code>enter code here
</code></pre>