如何根据索引转换值?

2024-09-29 06:33:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我有个问题

假设我有以下df:

**Document type** **Invoicenumber Invoicedate**  description quantity unit price line amount

Invoice             123            28-08-2020
0                   NaN            17-09-2020    test        1,5        5              20
0                   NaN            16-04-2020    test2       1,5        5              20

Invoice             456            02-03-2020
0                   NaN            NaN           test3       21         3              64
0                   0              NaN           test3       21         3              64            
0                   0              NaN           test3       21         3              64            

有0的行属于上面的行,并且是同一文档的行项目

我的目标是转置行项目,使其在每个发票的同一行上:

**Document type** **Invoicenumber Invoicedate**  description#1  description#2 quantity quantity#2 unit price   unit price #2 line amount line amount #2 
    
    Invoice             123            28-08-2020    test           test2          1,5      1,5     5            5              20         20

第二个发票行相同 如何做到这一点

请帮忙
根据要求:

{'Document Type': {0: 'IngramMicro.AccountsPayable.Invoice',
  1: 0,
  2: 0,
  3: 'IngramMicro.AccountsPayable.Invoice',
  4: 0,
  5: 0,
  6: 0},
 'Factuurnummer': {0: '0.78861803',
  1: 'NaN',
  2: 'NaN',
  3: '202130534',
  4: 'NaN',
  5: 'NaN',
  6: 'NaN'},
 'Factuurdatum': {0: '2021-05-03',
  1: nan,
  2: nan,
  3: '2021-09-03',
  4: nan,
  5: nan,
  6: nan},
 'description': {0: nan,
  1: 'TM 300 incl onderstel 3058C003 84433210 4549292119381',
  2: 'ESP 5Y 36 inch 7950A539 00000000 4960999794266',
  3: nan,
  4: 'Basistarief A3 Office',
  5: 'Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021',
  6: 'Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021'},
 'quantity': {0: nan, 1: 1.0, 2: 1.0, 3: nan, 4: 1.0, 5: 1.0, 6: 2.0},
 'unit price': {0: nan,
  1: 1211.63,
  2: 742.79,
  3: nan,
  4: 260.0,
  5: 30.0,
  6: 30.0},
 'line amount': {0: nan, 1: 21.0, 2: 21.0, 3: nan, 4: 260.0, 5: 30.0, 6: 30.0}}

Tags: testtypelineunitinvoicedescriptionnanamount
1条回答
网友
1楼 · 发布于 2024-09-29 06:33:23

一种不聪明但可行的方法是通过迭代原始dataframe的行来生成具有所需列的新dataframe

import ast
import sys
import pandas as pd
from io import StringIO

TESTDATA = StringIO("""Document type;Invoicenumber;Invoicedate;description;quantity;unit price;line amount
Invoice;123;28-08-2020
0;NaN;17-09-2020;test;1,5;5;20
0;NaN;16-04-2020;test2;1,5;5;20
Invoice;456;02-03-2020
0;NaN;NaN;test3;21;3;64
0;0;NaN;test3;21;3;64
0;0;NaN;test3;21;3;64
    """)

df = pd.read_csv(TESTDATA, delimiter=";")

df_new = pd.DataFrame(columns=df.columns)

first_type_index = 0
type_count = 0

for index, row in df.iterrows():
    if row['Document type'] == '0':
        type_count += 1

        df_new.loc[first_type_index, f'description#{type_count}'] = row['description']
        df_new.loc[first_type_index, f'quantity#{type_count}'] = row['quantity']
        df_new.loc[first_type_index, f'unit price#{type_count}'] = row['unit price']
        df_new.loc[first_type_index, f'line amount#{type_count}'] = row['line amount']

        row['Document type']
        row['Invoicenumber']
    else:
        first_type_index = index
        type_count = 0
        df_new.loc[first_type_index] = row

df_new.drop(['description', 'quantity', 'unit price', 'line amount'], axis=1, inplace=True)
|   | Document type | Invoicenumber | Invoicedate | description#1 | quantity#1 | unit price#1 | line amount#1 | description#2 | quantity#2 | unit price#2 | line amount#2 | description#3 | quantity#3 | unit price#3 | line amount#3 |
| 0 | Invoice       |         123.0 |  28-08-2020 | test          | "1,5"      |          5.0 |          20.0 | test2         | "1,5"      |          5.0 |          20.0 |               |            |              |               |
| 3 | Invoice       |         456.0 |  02-03-2020 | test3         | 21         |          3.0 |          64.0 | test3         | 21         |          3.0 |          64.0 | test3         |         21 |          3.0 |          64.0 |

要使用列名,可以使用

column_to_swap = df_new.columns[3:]
len_swap = len(column_to_swap)

column_swap = [None] * len_swap

j = 0

# 4 is the count of duplicated columns
# description, quantity, unit price, line amount
for i in range(4):
    fast_i = i

    while fast_i < len_swap:
        column_swap[j] = column_to_swap[fast_i]
        fast_i += 4
        j += 1

df_new = df_new.reindex(columns=list(df_new.columns.values[0:3])+column_swap)
|   | Document type | Invoicenumber | Invoicedate | description#1 | description#2 | description#3 | quantity#1 | quantity#2 | quantity#3 | unit price#1 | unit price#2 | unit price#3 | line amount#1 | line amount#2 | line amount#3 |
| 0 | Invoice       |         123.0 |  28-08-2020 | test          | test2         |               | "1,5"      | "1,5"      |            |          5.0 |          5.0 |              |          20.0 |          20.0 |               |
| 3 | Invoice       |         456.0 |  02-03-2020 | test3         | test3         | test3         | 21         | 21         |         21 |          3.0 |          3.0 |          3.0 |          64.0 |          64.0 |          64.0 |

相关问题 更多 >