回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>更新的解决方案:</p>
<p>我用<code>'|'</code>分隔一些列的数据,也就是说,它不是严格的<code>csv</code>。我将它作为csv导入,并尝试去除特定列中额外的<code>'|'</code>。我的数据如下:</p>
<pre><code> import pandas as pd
from io import StringIO
dfy = pd.read_csv('Thesis/CRSP/CampaignFin14/pacs14.txt', header=0)
#Replace '|' in cells with series.str methods
for col in dfy:
if dfy[col].dtype == 'object':
dfy[col] = dfy[col].str.replace('|', '')
dfy.head()
|2014| |4111920141231643319| |C00206136| |N00029285| 1000 05/15/2014 \
0 2014 |4021120141205164809| |C00307397| |N00026722| 5000 10/22/2013
1 2014 |4053020141213944220| |C00009985| |N00030676| 4 03/26/2014
2 2014 |4063020141216281752| |C00104299| |N00032088| 1000 05/06/2014
3 2014 |4061920141215566782| |C00164145| |N00034277| 2500 05/22/2014
4 2014 |4102420141226480432| |C00439216| |N00036023| 1000 09/29/2014
</code></pre>
<p>由于某些原因,循环没有取出<code>|</code></p>
<p>下面的工作,但我想一次做所有的专栏。在</p>
^{pr2}$
<p>这就是我使用<code>.csv</code>,<code>sep=</code>导入时数据的样子。在</p>
<pre><code> cycle cid amount date realcode type di feccandid
0 |2014| |N00029285| 1000 05/15/2014 |E1600| |24K| |D| |H8TX22107|
1 |2014| |N00026722| 5000 10/22/2013 |G4600| |24K| |D| |H4TX28046|
2 |2014| |N00030676| 4 03/26/2014 |C2100| |24Z| |D| |H0MO07113|
</code></pre>
<p>这是<code>.txt</code>中的样子:</p>
<pre><code>|2014|,|4111920141231643319|,|C00206136|,|N00029285|,1000,05/15/2014,|E1600|,|24K|,|D|,|H8TX22107|
|2014|,|4021120141205164809|,|C00307397|,|N00026722|,5000,10/22/2013,|G4600|,|24K|,|D|,|H4TX28046|
|2014|,|4053020141213944220|,|C00009985|,|N00030676|,4,03/26/2014,|C2100|,|24Z|,|D|,|H0MO07113|
|2014|,|4063020141216281752|,|C00104299|,|N00032088|,1000,05/06/2014,|F1100|,|24K|,|D|,|H0OH06189|
|2014|,|4061920141215566782|,|C00164145|,|N00034277|,2500,05/22/2014,|F3100|,|24K|,|D|,|H2NY22139|
</code></pre>
<p>这是指向我的<a href="https://github.com/108michael/ms_thesis/blob/master/pac_other14.txt.zip" rel="nofollow">rawdata</a>的链接</p>