Python:如何将2列相乘?

2024-10-02 22:23:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个简单的数据框架,我想添加“Pow_calkowita”列。如果'liczba_-kon'为0,'Pow_-calkowita'为'povierzchn',但如果'liczba_-kon'不为0,'Pow_-calkowita'为'liczba_-kon'*'povierzchn。为什么我不能那样做

for index, row in df.iterrows():
    if row['liczba_kon'] == 0:
        row['Pow_calkowita'] = row['Powierzchn']
    elif row['liczba_kon'] != 0:
        row['Pow_calkowita'] = row['Powierzchn'] * row['liczba_kon']

我的代码没有返回任何值

    liczba_kon  Powierzchn
0            3    69.60495
1            1    39.27270
2            1   130.41225
3            1   129.29570
4            1   294.94400
5            1    64.79345
6            1   108.75560
7            1    35.12290
8            1   178.23905
9            1   263.00930
10           1    32.02235
11           1   125.41480
12           1    47.05420
13           1    45.97135
14           1   154.87120
15           1    37.17370
16           1    37.80705
17           1    38.78760
18           1    35.50065
19           1    74.68940

我找到了一些解决办法:

result = []
for index, row in df.iterrows():
    if row['liczba_kon'] == 0:
        result.append(row['Powierzchn'])
    elif row['liczba_kon'] != 0:
        result.append(row['Powierzchn'] * row['liczba_kon'])
df['Pow_calkowita'] = result

这样好吗


Tags: indfforindexifresultrowelif
3条回答

要为Pandas编写惯用代码并利用Pandas的高效数组处理,您应该避免自己编写循环数组的代码。Pandas允许您在高效的numpy ndarray数据结构上使用矢量化来编写简洁的代码,同时高效地进行处理。在底层,它使用优化的C语言二进制代码进行快速数组处理。Pandas已经在幕后处理了必要的循环,这也是使用Pandas by single语句而无需显式编写循环来迭代所有元素的优势。通过使用Pandas,您将更好地享受其快速高效但简洁的矢量化处理

由于公式基于条件,因此不能使用直接乘法。相反,您可以按如下方式使用^{}

import numpy as np

df['Pow_calkowita'] = np.where(df['liczba_kon'] == 0,  df['Powierzchn'], df['Powierzchn'] * df['liczba_kon'])

当第一个参数中的测试条件为真时,取第二个参数的值,否则取第三个参数的值

测试运行输出:(在末尾再添加两个测试用例;一个测试用例的值为0liczba_kon

print(df)

    liczba_kon  Powierzchn  Pow_calkowita
0            3    69.60495      208.81485
1            1    39.27270       39.27270
2            1   130.41225      130.41225
3            1   129.29570      129.29570
4            1   294.94400      294.94400
5            1    64.79345       64.79345
6            1   108.75560      108.75560
7            1    35.12290       35.12290
8            1   178.23905      178.23905
9            1   263.00930      263.00930
10           1    32.02235       32.02235
11           1   125.41480      125.41480
12           1    47.05420       47.05420
13           1    45.97135       45.97135
14           1   154.87120      154.87120
15           1    37.17370       37.17370
16           1    37.80705       37.80705
17           1    38.78760       38.78760
18           1    35.50065       35.50065
19           1    74.68940       74.68940
20           0    69.60495       69.60495
21           2    74.68940      149.37880

dataframe设计为使用vectorication操作。可以将其视为数据库表。因此,您应该尽可能长时间地使用它的功能

tdf = df                                                     # temp df 
tdf['liczba_kon'] = tdf['liczba_kon'].replace(0, 1)          # replace 0 to 1
tdf['Pow_calkowita'] = tdf['liczba_kon'] * tdf['Powierzchn'] # multiply
df['Pow_calkowita'] = tdf['Pow_calkowita']                   # copy column

这简化了代码并增强了性能。我们可以测试它们的性能:

sampleSize = 100000
df=pd.DataFrame({
    'liczba_kon': np.random.randint(3, size=(sampleSize)),
    'Powierzchn': np.random.randint(1000, size=(sampleSize)),
    })

# vectorication
s = time.time()
tdf = df                                                     # temp df 
tdf['liczba_kon'] = tdf['liczba_kon'].replace(0, 1)          # replace 0 to 1
tdf['Pow_calkowita'] = tdf['liczba_kon'] * tdf['Powierzchn'] # multiply
df['Pow_calkowita'] = tdf['Pow_calkowita']                   # copy column
print(time.time() - s)

# iteration
s = time.time()
result = []
for index, row in df.iterrows():
    if row['liczba_kon'] == 0:
        result.append(row['Powierzchn'])
    elif row['liczba_kon'] != 0:
        result.append(row['Powierzchn'] * row['liczba_kon'])
df['Pow_calkowita'] = result
print(time.time() - s)

我们可以看到矢量化执行得更快

0.0034716129302978516
6.193516492843628

回答第一个问题:“为什么我不能这么做?”

{a1}国(在说明中):

Because iterrows returns a Series for each row, ....

You should never modify something you are iterating over. [...] the iterator returns a copy and not a view, and writing to it will have no effect.

这基本上意味着它返回一个包含该行值的新序列

因此,您得到的不是实际的行,也肯定不是数据帧

但你所做的是工作,尽管不是以你想要的方式:

df = DF(dict(a= [1,2,3], b= list("abc")))
df                 # To demonstrate what you are doing
   a  b
0  1  a
1  2  b
2  3  c

for index, row in df.iterrows():
...     print("\n         \n>>> Next Row:\n")
...     print(row)

...     row["c"] = "ADDED"           ####### HERE I am adding to 'the row'

...     print("\n   >> added:")
...     print(row)
...     print("           ")
...     
         
 Next Row:     # as you can see, this Series has the same values
a    1         # as the row that it represents
b    a
Name: 0, dtype: object  

   >> added:
a        1
b        a
c    ADDED     # and adding to it works... but you aren't doing anything 
Name: 0, dtype: object   # with it, unless you append it to a list
           


         
 Next Row:
a    2
b    b
Name: 1, dtype: object
                       ### same here
   >> added:
a        2
b        b
c    ADDED
Name: 1, dtype: object
           


         
 Next Row:
a    3
b    c
Name: 2, dtype: object
                          ### and here
   >> added:
a        3
b        c
c    ADDED
Name: 2, dtype: object
           

回答第二个问题:“这是好办法吗?”

没有

因为使用SeaBean所展示的乘法实际上使用了 numpy和pandas是矢量化操作。 This is a link to a good article on vectorization in numpy arrays,它们基本上是pandas数据帧和系列的构建块

相关问题 更多 >