如何使用不同数据帧的不同列进行计算?

2024-05-19 10:52:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这个数据框:

               DP1         DP2         DP3          DP4         DP5         DP6        DP7        DP8          DP9         DP10       DP11         DP12
    OP1     2,197,389   6,136,646   5,827,172   6,964,450   2,406,747   1,682,154   1,052,454   1,883,367               796,337     
    OP2     8,859,402   25,194,531  18,199,918  14,156,169  5,011,560   4,255,085   3,523,277   3,812,709   2,487,246   699,860     626,996     502,772
    OP3     14,787,376  35,731,942  24,747,885  35,725,650  8,723,298   6,430,740   7,750,373   1,577,873   3,353,858   2,583,798   872,799 
    OP4     21,493,250  44,408,206  51,271,615  46,638,746  14,476,529  11,773,563  8,582,066   2,106,156   495,311     3,733,716   4,242,765   600,503
    OP5     21,700,836  51,857,950  42,238,240  44,930,278  13,755,160  9,137,746   14,041,838  4,419,652   2,735,681   1,929,539   3,726,549   1,857,538
    OP6     27,197,016  56,597,455  53,092,978  43,047,300  17,452,924  15,215,678  11,417,947  2,712,132   1,300,171   4,584,131   4,052,668   2,033,861
    OP7     28,312,408  48,072,399  47,204,687  42,068,090  14,582,445  13,172,692  13,017,322  3,151,407   3,974,425   2,813,653   1,898,478   2,175,732
    OP8     21,161,654  50,484,594  45,726,512  33,390,233  10,678,218  14,829,062  11,180,288  4,075,341   2,817,544   3,821,326   2,615,968   343,419
    OP9     22,030,953  51,888,402  44,944,629  31,351,736  7,582,371   12,409,486  8,837,942   3,858,787   1,460,977   3,381,860   1,991,642   1,463,380
    OP10    19,800,999  39,785,669  40,423,641  34,994,271  11,765,266  13,458,795  10,641,247  2,257,935   1,221,440   4,426,476   1,895,439   
    OP11    26,744,235  44,861,464  37,066,722  33,657,069  10,375,969  11,308,453  7,970,174   2,786,659   1,582,985   9,757,740       
    OP12    14,967,698  33,305,571  34,915,179  26,178,516  11,533,264  7,838,632   14,209,013  2,109,380   7,943,767           
    OP13    15,860,127  35,304,111  35,374,476  27,839,885  13,643,958  8,819,719   7,806,556   8,604,275               
    OP14    16,421,831  29,630,667  26,810,400  27,094,657  9,360,617   10,228,159  17,245,611                  
    OP15    15,452,224  32,517,582  31,610,664  28,885,592  8,201,507   26,459,944                      
    OP16    12,994,042  35,936,216  28,716,273  24,939,457  32,432,699                          
    OP17    14,602,038  32,599,189  30,928,124  55,971,379                              
    OP18    17,740,235  38,745,384  95,247,120                                  
    OP19    23,064,250  164,908,882                                     
    OP20    221,345,563                                         
    Grand Total 566,733,526 857,966,859 694,346,236 557,833,478 191,982,531 167,019,909 137,276,106 43,355,674  29,373,406  38,528,438  21,923,305  8,977,206

使用上面的数据帧,我需要创建下面的数据帧。要创建的公式如下:dataframe是:

低于dataframe的第一列值与高于dataframe的第一列值相同。 查找低于数据帧公式的第二列值为=低于数据帧的第一列+高于数据帧的第二列(2197389+6136646)…依此类推

            DP1           DP2       DP3            DP4         DP5         DP6          DP7        DP8          DP9       DP10       DP11          DP12
OP1      2,197,389  8,334,035   14,161,207  21,125,657  23,532,404  25,214,558  26,267,012  28,150,378  28,150,378  28,946,716  28,946,716  28,946,716
OP2      8,859,402  34,053,934  52,253,852  66,410,020  71,421,580  75,676,666  79,199,942  83,012,651  85,499,897  86,199,757  86,826,753  87,329,526
OP3     14,787,376  50,519,318  75,267,203  110,992,853 119,716,151 126,146,890 133,897,263 135,475,136 138,828,994 141,412,792 142,285,590 142,285,590
OP4     21,493,250  65,901,455  117,173,070 163,811,816 178,288,345 190,061,908 198,643,974 200,750,131 201,245,441 204,979,157 209,221,922 209,822,425
OP5     21,700,836  73,558,786  115,797,026 160,727,304 174,482,464 183,620,210 197,662,049 202,081,701 204,817,382 206,746,921 210,473,470 212,331,007
OP6     27,197,016  83,794,471  136,887,449 179,934,749 197,387,673 212,603,351 224,021,298 226,733,430 228,033,601 232,617,732 236,670,400 238,704,261
OP7     28,312,408  76,384,807  123,589,494 165,657,584 180,240,029 193,412,721 206,430,043 209,581,449 213,555,875 216,369,528 218,268,006 220,443,738
OP8     21,161,654  71,646,248  117,372,760 150,762,994 161,441,212 176,270,274 187,450,562 191,525,903 194,343,447 198,164,774 200,780,742 201,124,162
OP9     22,030,953  73,919,354  118,863,984 150,215,720 157,798,090 170,207,577 179,045,519 182,904,306 184,365,283 187,747,144 189,738,786 191,202,166
OP10    19,800,999  59,586,667  100,010,308 135,004,580 146,769,845 160,228,640 170,869,887 173,127,822 174,349,262 178,775,739 180,671,178 
OP11    26,744,235  71,605,699  108,672,421 142,329,490 152,705,459 164,013,912 171,984,085 174,770,744 176,353,730 186,111,470     
OP12    14,967,698  48,273,269  83,188,448  109,366,964 120,900,227 128,738,859 142,947,872 145,057,252 153,001,020         
OP13    15,860,127  51,164,239  86,538,714  114,378,599 128,022,558 136,842,277 144,648,833 153,253,108             
OP14    16,421,831  46,052,498  72,862,898  99,957,555  109,318,172 119,546,332 136,791,942                 
OP15    15,452,224  47,969,806  79,580,470  108,466,062 116,667,569 143,127,512                     
OP16    12,994,042  48,930,258  77,646,531  102,585,988 135,018,687                         
OP17    14,602,038  47,201,227  78,129,351  134,100,730                             
OP18    17,740,235  56,485,619  151,732,740                                 
OP19    23,064,250  187,973,132                                     
OP20    221,345,563                                         
Grand Total 566,733,526 1,203,354,822   1,709,727,926   2,115,828,664   2,173,710,465   2,205,711,687   2,199,860,280   2,106,424,012   1,982,544,310   1,868,071,728   1,703,883,563   1,532,189,592

到目前为止,我掌握的代码是:

    cursor = self.conn.cursor()
    cursor.execute("select * from incremental_paid")
    IncrementalPaidTriangleres = cursor.fetchall() 
    IncrementalPaidTriangledf = pd.DataFrame(IncrementalPaidTriangleres)

    numRows = IncrementalPaidTriangledf.shape[0]
    numCols = IncrementalPaidTriangledf.shape[1]
    columns = ["DP"+str(x) for x in range(1,numCols+1)]
    index = ["OP"+str(x) for x in range(1,numRows+1)]

    IncrementalPaidTriangledf.columns = columns
    IncrementalPaidTriangledf.index = index
    
    print(IncrementalPaidTriangledf,"\n\n")

Tags: columns数据dataframeindexcursordp1dp2dp4
2条回答

{a1}做你想做的事吗

import pandas as pd
import numpy as np

df = pd.DataFrame({f'DP{i}': [x * 10**i for x in range(5 - i)] + [np.nan] * i 
                   for i in range(3)})
#    DP0   DP1    DP2
# 0    0   0.0    0.0
# 1    1  10.0  100.0
# 2    2  20.0  200.0
# 3    3  30.0    NaN
# 4    4   NaN    NaN


df.cumsum(axis=1)
#    DP0   DP1    DP2
# 0  0.0   0.0    0.0
# 1  1.0  11.0  111.0
# 2  2.0  22.0  222.0
# 3  3.0  33.0    NaN
# 4  4.0   NaN    NaN

更新: 阅读您的示例数据框并应用cumsum对我很有用。我怀疑您的数据框中还有其他奇怪的东西

data = """
               DP1         DP2         DP3          DP4         DP5         DP6        DP7        DP8          DP9         DP10       DP11         DP12
    OP1     2,197,389   6,136,646   5,827,172   6,964,450   2,406,747   1,682,154   1,052,454   1,883,367               796,337     
    OP2     8,859,402   25,194,531  18,199,918  14,156,169  5,011,560   4,255,085   3,523,277   3,812,709   2,487,246   699,860     626,996     502,772
    OP3     14,787,376  35,731,942  24,747,885  35,725,650  8,723,298   6,430,740   7,750,373   1,577,873   3,353,858   2,583,798   872,799 
    OP4     21,493,250  44,408,206  51,271,615  46,638,746  14,476,529  11,773,563  8,582,066   2,106,156   495,311     3,733,716   4,242,765   600,503
    OP5     21,700,836  51,857,950  42,238,240  44,930,278  13,755,160  9,137,746   14,041,838  4,419,652   2,735,681   1,929,539   3,726,549   1,857,538
    OP6     27,197,016  56,597,455  53,092,978  43,047,300  17,452,924  15,215,678  11,417,947  2,712,132   1,300,171   4,584,131   4,052,668   2,033,861
    OP7     28,312,408  48,072,399  47,204,687  42,068,090  14,582,445  13,172,692  13,017,322  3,151,407   3,974,425   2,813,653   1,898,478   2,175,732
    OP8     21,161,654  50,484,594  45,726,512  33,390,233  10,678,218  14,829,062  11,180,288  4,075,341   2,817,544   3,821,326   2,615,968   343,419
    OP9     22,030,953  51,888,402  44,944,629  31,351,736  7,582,371   12,409,486  8,837,942   3,858,787   1,460,977   3,381,860   1,991,642   1,463,380
    OP10    19,800,999  39,785,669  40,423,641  34,994,271  11,765,266  13,458,795  10,641,247  2,257,935   1,221,440   4,426,476   1,895,439   
    OP11    26,744,235  44,861,464  37,066,722  33,657,069  10,375,969  11,308,453  7,970,174   2,786,659   1,582,985   9,757,740       
    OP12    14,967,698  33,305,571  34,915,179  26,178,516  11,533,264  7,838,632   14,209,013  2,109,380   7,943,767           
    OP13    15,860,127  35,304,111  35,374,476  27,839,885  13,643,958  8,819,719   7,806,556   8,604,275               
    OP14    16,421,831  29,630,667  26,810,400  27,094,657  9,360,617   10,228,159  17,245,611                  
    OP15    15,452,224  32,517,582  31,610,664  28,885,592  8,201,507   26,459,944                      
    OP16    12,994,042  35,936,216  28,716,273  24,939,457  32,432,699                          
    OP17    14,602,038  32,599,189  30,928,124  55,971,379                              
    OP18    17,740,235  38,745,384  95,247,120                                  
    OP19    23,064,250  164,908,882                                     
    OP20    221,345,563                                         
    Grand Total 566,733,526 857,966,859 694,346,236 557,833,478 191,982,531 167,019,909 137,276,106 43,355,674  29,373,406  38,528,438  21,923,305  8,977,206
    """

dr = list()
for dl in data.split('\n')[2:-2]:
    fields = [dl[i:i+12] for i in range(0, len(dl), 12)]
    
    values = [f.replace(',', '').strip() for f in fields[1:]]
    values_int = [int(f) if f != '' else np.nan for f in values]
    dr.append(values_int)
df = pd.DataFrame(dr, dtype=int)

numRows = df.shape[0]
numCols = df.shape[1]
columns = ["DP"+str(x) for x in range(1,numCols+1)]
index = ["OP"+str(x) for x in range(1,numRows+1)]

df.columns = columns
df.index = index
#           DP1         DP2         DP3         DP4         DP5         DP6  
# OP1   2197389   6136646.0   5827172.0   6964450.0   2406747.0   1682154.0   
# OP2   8859402  25194531.0  18199918.0  14156169.0   5011560.0   4255085.0   
# OP3  14787376  35731942.0  24747885.0  35725650.0   8723298.0   6430740.0   
# OP4  21493250  44408206.0  51271615.0  46638746.0  14476529.0  11773563.0   
# OP5  21700836  51857950.0  42238240.0  44930278.0  13755160.0   9137746.0   

df.cumsum(axis=1)
#             DP1          DP2          DP3          DP4          DP5  \
# OP1     2197389.0    8334035.0   14161207.0   21125657.0   23532404.0   
# OP2     8859402.0   34053933.0   52253851.0   66410020.0   71421580.0   
# OP3    14787376.0   50519318.0   75267203.0  110992853.0  119716151.0   
# OP4    21493250.0   65901456.0  117173071.0  163811817.0  178288346.0   
# OP5    21700836.0   73558786.0  115797026.0  160727304.0  174482464.0   

我想您需要先将列转换为数字类型,然后才能使用cumsum-

for col in df.columns.values:
    df[col] = df[col].str.replace(',', '')
    df[col] = pd.to_numeric(df[col],errors='coerce')
df = df.cumsum(axis=1)

相关问题 更多 >

    热门问题