回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p><strong>数据帧:</strong></p>
<p>让我澄清我的问题。我的<code>pandas.DataFrame</code>看起来像这样</p>
<pre><code>data = [
['word11', 'word12', 'word13', 'word14', 0, 0, 0, 0, 0],
['word21', 'word22', 'word23', 'word24', 0, -3, 34, 0, 0],
['word31', 'word32', 'word33', 'word34', 0, 1.6, 0, 0, 0],
['word41', 'word42', 'word43', 'word44', 0, 0, 0, 0, 0]
]
df = pd.DataFrame(data, columns=['word1', 'word2', 'word3', 'word4', 'C1', 'C2', 'C3', 'C4', 'C5'])
</code></pre>
<p><strong>要生成的输出:</strong></p>
<p>从这里,我想得到一个数据帧,它看起来像</p>
<pre><code> word1 word2 word3 word4 C1 C2 C3 C4 C5
0 word11 word12 word13 word14 0 0.0 0 0 0
1 word21 word22 word23 word24 0 -3.0 34 0 0
2 word31 word32 word33 word34 0 1.6 0 0 0
3 word41 word42 word43 word44 0 0.0 0 0 0
</code></pre>
<p><strong>我的节目:</strong></p>
<p>下面是我为获得上述数据帧所做的工作</p>
<pre><code>primary_columns = ['word1', 'word2', 'word3', 'word4']
transposing_columns = ['C1', 'C2', 'C3', 'C4', 'C5']
transposed_df = df.melt(id_vars=primary_columns, value_vars=transposing_columns)
compare_columns = primary_columns + ['value']
</code></pre>
<p>然后,我根据“value”列的值将数据帧分为两个,并删除了重复项</p>
<pre><code>nonzero_df = transposed_df[transposed_df['value'] != 0]
zero_df = transposed_df[transposed_df['value'] == 0]
zero_df = zero_df.drop_duplicates(subset=compare_columns, keep='first')
df = nonzero_df.append(zero_df)
</code></pre>
<p>这给了我以下输出</p>
<pre><code>df = df.reset_index(drop=True)
df
word1 word2 word3 word4 variable value
0 word21 word22 word23 word24 C2 -3.0
1 word31 word32 word33 word34 C2 1.6
2 word21 word22 word23 word24 C3 34.0
3 word11 word12 word13 word14 C1 0.0
4 word21 word22 word23 word24 C1 0.0
5 word31 word32 word33 word34 C1 0.0
6 word41 word42 word43 word44 C1 0.0
</code></pre>
<p><strong>问题:</strong></p>
<p>我不想看到<code>df.iloc[4]</code>和<code>df.iloc[5]</code></p>
<p>如果<code>word1</code>、<code>word2</code>、<code>word3</code>和<code>word4</code>的值相同,但差异仅存在于<code>value</code>列中,则保留值为非零值的行,并删除值为0的行。我不关心<code>variable</code>列的值</p>
<p>我怎样才能做到这一点</p>
<p><strong>注意:</strong></p>
<ol>
<li>我的数据框很大。它包含近百万行,超过15<code>Word*</code>类型的列和超过115<code>C*</code>类型的列(<code>word*</code>和<code>C*</code>是我在示例中使用的列名)</李>
<li>我将<code>Python 2.7</code>与<code>Pandas 0.17</code>一起使用</李>
</ol>