回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我有一个金融交易的大数据帧(150000 x 25)。许多(并非所有这些交易)在以后的日期被撤销。我想创建一个新列,用于标识交易何时被撤销</p>
<p>我已经针对类别、类型和源列尝试了一些函数,如drop_duplicates()和duplicated(),但无法缩小解决方案的范围。有什么建议吗</p>
<pre><code>import pandas as pd
d_in = {'key' : ['81371453', '93045710', '22123452', '18233745', '84933451', '95832374', '20283456', '20239485', '95843745'],
'date' : ['20200901', '20200901', '20200902', '20200902', '20200902','20200903', '20200904', '20200905', '20200905'],
'category' : ['Z293', 'B993', 'Z293', 'B993', 'W884', 'C123', 'V332', 'C123', 'V332'],
'type' : ['tools', 'supplies', 'tools', 'supplies', 'repairs', 'custom', 'misc', 'custom', 'misc'],
'source' : ['Q112', 'E443', 'Q112', 'E443', 'P443', 'B334', 'E449', 'B334', 'E449'],
'amount' : [123.21, 3.12, -123.21, -3.12, 9312.00, 312.23, -13.23, -312.23, 13.23]}
df_in = pd.DataFrame(data=d_in)
d_out = {'key' : ['81371453', '93045710', '22123452', '18233745', '84933451', '95832374', '20283456', '20239485', '95843745'],
'date' : ['20200901', '20200901', '20200902', '20200902', '20200902','20200903', '20200904', '20200905', '20200905'],
'category' : ['Z293', 'B993', 'Z293', 'B993', 'W884', 'C123', 'V332', 'C123', 'V332'],
'type' : ['tools', 'supplies', 'tools', 'supplies', 'repairs', 'custom', 'misc', 'custom', 'misc'],
'source' : ['Q112', 'E443', 'Q112', 'E443', 'P443', 'B334', 'E449', 'B334', 'E449'],
'amount' : [123.21, 3.12, -123.21, -3.12, 9312.00, 312.23, -13.23, -312.23, 13.23],
'reversed' : ['20200902', '20200902', '20200901', '20200901', 'none', '20200905', '20200905', '20200903', '20200904']}
df_out = pd.DataFrame(data=d_out)
</code></pre>