Python中的行到列

2024-10-16 22:36:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个CSV文件包含一个时间序列的每日降水量。问题是如何组织数据。下面是一个小示例:

date        p01 p02 p03 p04 p05 p06
01-01-1941  33.6 7.1 22.3 0 0 0
01-02-1941  0 0 1.1 11.3 0 0

因此,每个月的每一天都有一列(p01是第1天的降水量,p02对应第2天,以此类推)。我想有这样的结构:一列到目前为止,另一列到降水值。你知道吗

date        p
01-01-1941  33.6
02-01-1941  7.1
03-01-1941  22.3
04-01-1941  0
05-01-1941  0
06-01-1941  0
01-02-1941  0
02-02-1941  0
03-02-1941  1.1
04-02-1941  11.3
05-02-1941  0
06-02-1941  0

我已经找到了一些代码示例,但是对于这个特定的问题没有成功。总的来说,他们建议尝试使用熊猫。有没有人建议我解决这个问题或者给我一个很好的建议来指导我的学习?谢谢。(我为我糟糕的英语感到抱歉)


Tags: 文件csv数据示例date时间序列建议
3条回答

嗯,我得到了答案,但它不只是一个命令或任何魔法功能。我就是这样得到答案的。您可以进一步优化此代码。希望这有帮助!你知道吗

import pandas as pd

from datetime import timedelta


df = pd.read_csv('myfile.csv')

df[u'date'] = pd.to_datetime(df[u'date'])



p1 = df[[u'date', u'p01']].copy()
p2 = df[[u'date', u'p02']].copy()
p3 = df[[u'date', u'p03']].copy()
p4 = df[[u'date', u'p04']].copy()
p5 = df[[u'date', u'p05']].copy()

# renaming cols -p1,p2,p3,p4
p1.columns = ['date','val']
p2.columns = ['date','val']
p3.columns = ['date','val']
p4.columns = ['date','val']
p5.columns = ['date','val']

p1['col'] = 'p01'
p2['col'] = 'p02'
p3['col'] = 'p03'
p4['col'] = 'p04'
p5['col'] = 'p05'


main = pd.concat([p1,p2,p3,p4,p5])


main['days2add'] = main['col'].apply(lambda x: int(x.strip('p')) -1 )

ff = lambda row : row[u'date'] + timedelta(row[u'days2add'])

main['new_date'] = main.apply(ff, axis=1)

In [209]: main[['new_date', u'val']]
Out[209]:
    new_date   val
0 1941-01-01  33.6
0 1941-01-02   7.1
0 1941-01-03  22.3
0 1941-01-04   0.0
0 1941-01-05   0.0

我的csv文件内容:

In [210]: df
Out[210]:
        date   p01  p02   p03  p04  p05  p06
0 1941-01-01  33.6  7.1  22.3    0    0    0

我的输出内容:

In [209]: main[['new_date', u'val']]
Out[209]:
    new_date   val
0 1941-01-01  33.6
0 1941-01-02   7.1
0 1941-01-03  22.3
0 1941-01-04   0.0
0 1941-01-05   0.0

我认为您可以首先使用^{},然后使用^{}^{}来重塑DataFrame,然后转换列days^{},并将其添加到列date

import pandas as pd
import io

temp=u"""date;p01;p02;p03;p04;p05;p06
01-01-1941;33.6;7.1;22.3;0;0;0
01-02-1941;0;0;1.1;11.3;0;0"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep=";")
print df
         date   p01  p02   p03   p04  p05  p06
0  01-01-1941  33.6  7.1  22.3   0.0    0    0
1  01-02-1941   0.0  0.0   1.1  11.3    0    0
#convert coolumn date to datetime
df.date = pd.to_datetime(df.date, dayfirst=True)
print df
        date   p01  p02   p03   p04  p05  p06
0 1941-01-01  33.6  7.1  22.3   0.0    0    0
1 1941-02-01   0.0  0.0   1.1  11.3    0    0

#stack, rename columns
df1 = df.set_index('date').stack().reset_index(name='p').rename(columns={'level_1':'days'})
print df1
         date days     p
0  1941-01-01  p01  33.6
1  1941-01-01  p02   7.1
2  1941-01-01  p03  22.3
3  1941-01-01  p04   0.0
4  1941-01-01  p05   0.0
5  1941-01-01  p06   0.0
6  1941-02-01  p01   0.0
7  1941-02-01  p02   0.0
8  1941-02-01  p03   1.1
9  1941-02-01  p04  11.3
10 1941-02-01  p05   0.0
11 1941-02-01  p06   0.0
#convert column to timedelta in days
df1.days = pd.to_timedelta(df1.days.str[1:].astype(int) - 1, unit='D')
print df1.days
0    0 days
1    1 days
2    2 days
3    3 days
4    4 days
5    5 days
6    0 days
7    1 days
8    2 days
9    3 days
10   4 days
11   5 days
Name: days, dtype: timedelta64[ns]

#add timedelta
df1['date'] = df1['date'] + df1['days']
#remove unnecessary column
df1 = df1.drop('days', axis=1)
print df1
         date     p
0  1941-01-01  33.6
1  1941-01-02   7.1
2  1941-01-03  22.3
3  1941-01-04   0.0
4  1941-01-05   0.0
5  1941-01-06   0.0
6  1941-02-01   0.0
7  1941-02-02   0.0
8  1941-02-03   1.1
9  1941-02-04  11.3
10 1941-02-05   0.0
11 1941-02-06   0.0

编辑:抱歉,问题的名称有点误导。对于您给出的示例输出(将所有p折叠为一列),您可以执行以下操作:

# Opening the example file you gave
fid = open('csv.txt','r')
lines = fid.readlines()
fid.close()

fid = open('output2.txt','w')
fid.write('%15s %15s\n'%(lines[0].split()[0],'p'))
for i in range(1,len(lines)):
    iline = lines[i].split()
    for j in range(1,len(iline)):
        fid.write('%15s %15s\n'%(iline[0],iline[j]))
fid.close()

,其结果是:

       date               p
 01-01-1941            33.6
 01-01-1941             7.1
 01-01-1941            22.3
 01-01-1941               0
 01-01-1941               0
 01-01-1941               0
 01-02-1941               0
 01-02-1941               0
 01-02-1941             1.1
 01-02-1941            11.3
 01-02-1941               0
 01-02-1941               0

原文:可能与某人有关。你知道吗

确实有很多方法可以做到这一点。但是考虑到您没有特别的偏好(如果文件不是很大的话),您可能只想使用本机Python。你知道吗

def rows2columns(lines):
    ilines = []
    for i in lines:
        ilines.append(i.split())
    new = []
    for j in range(len(ilines[0])):
        local = []
        for i in range(len(ilines)):
            local.append(ilines[i][j])
        new.append(local)
    return new

def writefile(new,path='output.txt'):
    fid = open(path,'w')
    for i in range(len(new)):
        for j in range(len(new[0])):
            fid.write('%15s'%new[i][j])
        fid.write('\n')
    fid.close()

# Opening the example file you gave
fid = open('csv.txt','r')
lines = fid.readlines()
fid.close()

# Putting the list of lines to be reversed
new = rows2columns(lines)
# Writing the result to a file
writefile(new,path='output.txt')

,输出文件如下:

       date     01-01-1941     01-02-1941
        p01           33.6              0
        p02            7.1              0
        p03           22.3            1.1
        p04              0           11.3
        p05              0              0
        p06              0              0

这可能是您可能拥有的最简单(或最接近)的本地python方法。csv模块、numpy或pandas的其他功能将具有您可能想要利用的其他功能。特别是这个不需要进口。你知道吗

相关问题 更多 >