Pandas:从d的列中删除第一个和最后一个元素

2024-10-01 04:55:06 发布

您现在位置:Python中文网/ 问答频道 /正文

更新的解决方案:

我用'|'分隔一些列的数据,也就是说,它不是严格的csv。我将它作为csv导入,并尝试去除特定列中额外的'|'。我的数据如下:

    import pandas as pd
from io import StringIO



dfy = pd.read_csv('Thesis/CRSP/CampaignFin14/pacs14.txt', header=0)

#Replace '|' in cells with series.str methods
for col in dfy:
    if dfy[col].dtype == 'object':
        dfy[col] = dfy[col].str.replace('|', '')

        dfy.head()



  |2014|  |4111920141231643319|  |C00206136|  |N00029285|  1000  05/15/2014  \
0   2014  |4021120141205164809|  |C00307397|  |N00026722|  5000  10/22/2013   
1   2014  |4053020141213944220|  |C00009985|  |N00030676|     4  03/26/2014   
2   2014  |4063020141216281752|  |C00104299|  |N00032088|  1000  05/06/2014   
3   2014  |4061920141215566782|  |C00164145|  |N00034277|  2500  05/22/2014   
4   2014  |4102420141226480432|  |C00439216|  |N00036023|  1000  09/29/2014   

由于某些原因,循环没有取出|

下面的工作,但我想一次做所有的专栏。在

^{pr2}$

这就是我使用.csvsep=导入时数据的样子。在

    cycle   cid     amount  date    realcode    type    di  feccandid
0   |2014|  |N00029285|     1000    05/15/2014  |E1600|     |24K|   |D|     |H8TX22107|
1   |2014|  |N00026722|     5000    10/22/2013  |G4600|     |24K|   |D|     |H4TX28046|
2   |2014|  |N00030676|     4   03/26/2014  |C2100|     |24Z|   |D|     |H0MO07113|

这是.txt中的样子:

|2014|,|4111920141231643319|,|C00206136|,|N00029285|,1000,05/15/2014,|E1600|,|24K|,|D|,|H8TX22107|
|2014|,|4021120141205164809|,|C00307397|,|N00026722|,5000,10/22/2013,|G4600|,|24K|,|D|,|H4TX28046|
|2014|,|4053020141213944220|,|C00009985|,|N00030676|,4,03/26/2014,|C2100|,|24Z|,|D|,|H0MO07113|
|2014|,|4063020141216281752|,|C00104299|,|N00032088|,1000,05/06/2014,|F1100|,|24K|,|D|,|H0OH06189|
|2014|,|4061920141215566782|,|C00164145|,|N00034277|,2500,05/22/2014,|F3100|,|24K|,|D|,|H2NY22139|

这是指向我的rawdata的链接


Tags: csv数据inimporttxtcolpdstr
2条回答

您可以对内存中的文件进行预处理,从行中删除所有|,并将其传递给Pandas。在

import io
import pandas as pd

with open('Thesis/CRSP/CampaignFin14/pacs14.txt', 'r') as fi:
    content = ''
    for line in fi:
        content += line.replace('|', '')

block = io.StringIO(content)
dfy2 = pd.read_csv(block, skipinitialspace=True, delim_whitespace=True)

在这里,首先从带有io.StringIO()的字符串创建一个类似缓冲区的对象,然后将其传递给接受第一个参数文件名或缓冲区的pd.read_csv。在

读入csv并使用Series ^{}操作,如^{}

import pandas as pd
from cStringIO import StringIO

# Fake csv text for example
textcsv = '''
cycle,cid,amount,date,realcode,type,di,feccandid
|2014|,|N00029285|,1000,05/15/2014,|E1600|,|24K|,|D|,|H8TX22107|
|2014|,|N00026722|,5000,10/22/2013,|G4600|,|24K|,|D|,|H4TX28046|
|2014|,|N00030676|,4   ,03/26/2014,|C2100|,|24Z|,|D|,|H0MO07113|
'''
# Read in fake csv
# normally you would use: dfy = pd.read_csv('/path/to/file.csv')
dfy = pd.read_csv(StringIO(textcsv))

# Replace '|' in cells with series.str methods
for col in dfy:
    if dfy[col].dtype == 'object':
        dfy[col] = dfy[col].str.replace('|', '')

print day

^{pr2}$

相关问题 更多 >