基于其他列的值合并大型数据帧

2024-09-30 08:23:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我该怎么做?我在.csv文件中有以下数据集:

    +------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+
    |    Date    | NBDG LN Equity |    Date    | P2P LN Equity |    Date    | HWSL LN Equity |    Date    | BPCR LN Equity |    Date    | AXI LN Equity |
    +------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+
    | 09-08-2017 |           78,5 | 09-08-2017 |       877,061 | 09-08-2017 |        107,082 | 09-08-2017 |         1,0981 | 08-08-2017 |            94 |
    | 08-08-2017 |           78,5 | 08-08-2017 |      878,7899 | 08-08-2017 |          106,5 | 08-08-2017 |         1,1021 | 07-08-2017 |            94 |
    | 03-08-2017 |           78,5 | 07-08-2017 |       879,709 | 07-08-2017 |          106,2 | 07-08-2017 |         1,0945 | 02-08-2017 |       98,2472 |
    | 01-08-2017 |           78,5 | 04-08-2017 |      879,6708 | 04-08-2017 |       105,4882 | 04-08-2017 |         1,0932 | 27-07-2017 |          98,5 |
    +------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+

我想“合并”成以下格式:

+------------+----------------+---------------+----------------+----------------+---------------+
|    Date    | NBDG LN Equity | P2P LN Equity | HWSL LN Equity | BPCR LN Equity | AXI LN Equity |
+------------+----------------+---------------+----------------+----------------+---------------+
| 09-08-2017 | 78,5           | 877,061       | 107,082        | 1,0981         | NA            |
| 08-08-2017 | 78,5           | 878,7899      | 106,5          | 1,1021         | 94            |
| 07-08-2017 | NA             | 879,709       | 106,2          | 1,0945         | 94            |
| 04-08-2017 | NA             | 879,6708      | 105,4882       | 1,0932         | NA            |
| 03-08-2017 | 78,5           | NA            | NA             | NA             | NA            |
| 02-08-2017 | NA             | NA            | NA             | NA             | 98,2472       |
| 01-08-2017 | 78,5           | NA            | NA             | NA             | NA            |
| 27-07-2017 | NA             | NA            | NA             | NA             | 98,5          |
+------------+----------------+---------------+----------------+----------------+---------------+

我怎样才能做到这一点没有硬编码太多?我首先使用

dfData = local_csv('Data.csv', timezone='DK', sep=';')
lDateColumns = [col for col in dfData.columns if 'Date' in col]
dfData[dfData[lDateColumns].apply(pd.Series.nunique, axis=1)==1]

直到我注意到索引有时会相对偏移,结果只剩下4行。你知道吗

谢谢


Tags: 文件csvindatecolp2plnna
1条回答
网友
1楼 · 发布于 2024-09-30 08:23:27

我将数据帧逐段分解(更准确地说,是2列乘2列),然后将所有内容合并到一起:

In [103]: df
Out[103]: 
         Date NBDG LN Equity      Date.1 P2P LN Equity      Date.2  \
0  09-08-2017           78,5  09-08-2017       877,061  09-08-2017   
1  08-08-2017           78,5  08-08-2017      878,7899  08-08-2017   
2  03-08-2017           78,5  07-08-2017       879,709  07-08-2017   
3  01-08-2017           78,5  04-08-2017      879,6708  04-08-2017   

  HWSL LN Equity      Date.3 BPCR LN Equity      Date.4 AXI LN Equity  
0        107,082  09-08-2017         1,0981  08-08-2017            94  
1          106,5  08-08-2017         1,1021  07-08-2017            94  
2          106,2  07-08-2017         1,0945  02-08-2017       98,2472  
3       105,4882  04-08-2017         1,0932  27-07-2017          98,5

In [114]: res = []

In [115]: for i in range(5):
     ...:     df_temp = pd.concat([df.iloc[:, 2*i], df.iloc[:, 2*i+1]], axis=1)
     ...:     df_temp.columns = ['Date', df_temp.columns[1]]
     ...:     res.append(df_temp)
     ...:     

我们现在有了一个dataframe数组,它的第一列总是日期(称为“date”),第二列是相关的度量。我们将使用^{}将所有内容合并在一起

In [117]: from functools import reduce

In [120]: reduce(lambda df1,df2: df1.merge(df2, on='Date', how='outer'), res)
Out[120]: 
         Date NBDG LN Equity P2P LN Equity HWSL LN Equity BPCR LN Equity  \
0  09-08-2017           78,5       877,061        107,082         1,0981   
1  08-08-2017           78,5      878,7899          106,5         1,1021   
2  03-08-2017           78,5           NaN            NaN            NaN   
3  01-08-2017           78,5           NaN            NaN            NaN   
4  07-08-2017            NaN       879,709          106,2         1,0945   
5  04-08-2017            NaN      879,6708       105,4882         1,0932   
6  02-08-2017            NaN           NaN            NaN            NaN   
7  27-07-2017            NaN           NaN            NaN            NaN   

  AXI LN Equity  
0           NaN  
1            94  
2           NaN  
3           NaN  
4            94  
5           NaN  
6       98,2472  
7          98,5  

相关问题 更多 >

    热门问题