Python：如何将2列相乘？

3条回答

网友

1楼 · 编辑于 2024-10-02 22:23:28

要为Pandas编写惯用代码并利用Pandas的高效数组处理，您应该避免自己编写循环数组的代码。Pandas允许您在高效的numpy ndarray数据结构上使用矢量化来编写简洁的代码，同时高效地进行处理。在底层，它使用优化的C语言二进制代码进行快速数组处理。Pandas已经在幕后处理了必要的循环，这也是使用Pandas by single语句而无需显式编写循环来迭代所有元素的优势。通过使用Pandas，您将更好地享受其快速高效但简洁的矢量化处理

由于公式基于条件，因此不能使用直接乘法。相反，您可以按如下方式使用^{}：

import numpy as np

df['Pow_calkowita'] = np.where(df['liczba_kon'] == 0,  df['Powierzchn'], df['Powierzchn'] * df['liczba_kon'])

当第一个参数中的测试条件为真时，取第二个参数的值，否则取第三个参数的值

测试运行输出：（在末尾再添加两个测试用例；一个测试用例的值为0liczba_kon）

print(df)

    liczba_kon  Powierzchn  Pow_calkowita
0            3    69.60495      208.81485
1            1    39.27270       39.27270
2            1   130.41225      130.41225
3            1   129.29570      129.29570
4            1   294.94400      294.94400
5            1    64.79345       64.79345
6            1   108.75560      108.75560
7            1    35.12290       35.12290
8            1   178.23905      178.23905
9            1   263.00930      263.00930
10           1    32.02235       32.02235
11           1   125.41480      125.41480
12           1    47.05420       47.05420
13           1    45.97135       45.97135
14           1   154.87120      154.87120
15           1    37.17370       37.17370
16           1    37.80705       37.80705
17           1    38.78760       38.78760
18           1    35.50065       35.50065
19           1    74.68940       74.68940
20           0    69.60495       69.60495
21           2    74.68940      149.37880

网友

2楼 · 编辑于 2024-10-02 22:23:28

dataframe设计为使用vectorication操作。可以将其视为数据库表。因此，您应该尽可能长时间地使用它的功能

tdf = df                                                     # temp df 
tdf['liczba_kon'] = tdf['liczba_kon'].replace(0, 1)          # replace 0 to 1
tdf['Pow_calkowita'] = tdf['liczba_kon'] * tdf['Powierzchn'] # multiply
df['Pow_calkowita'] = tdf['Pow_calkowita']                   # copy column

这简化了代码并增强了性能。我们可以测试它们的性能：

sampleSize = 100000
df=pd.DataFrame({
    'liczba_kon': np.random.randint(3, size=(sampleSize)),
    'Powierzchn': np.random.randint(1000, size=(sampleSize)),
    })

# vectorication
s = time.time()
tdf = df                                                     # temp df 
tdf['liczba_kon'] = tdf['liczba_kon'].replace(0, 1)          # replace 0 to 1
tdf['Pow_calkowita'] = tdf['liczba_kon'] * tdf['Powierzchn'] # multiply
df['Pow_calkowita'] = tdf['Pow_calkowita']                   # copy column
print(time.time() - s)

# iteration
s = time.time()
result = []
for index, row in df.iterrows():
    if row['liczba_kon'] == 0:
        result.append(row['Powierzchn'])
    elif row['liczba_kon'] != 0:
        result.append(row['Powierzchn'] * row['liczba_kon'])
df['Pow_calkowita'] = result
print(time.time() - s)

我们可以看到矢量化执行得更快

0.0034716129302978516
6.193516492843628

网友

3楼 · 编辑于 2024-10-02 22:23:28

回答第一个问题：“为什么我不能这么做？”

{a1}国（在说明中）：

Because iterrows returns a Series for each row, ....

及

You should never modify something you are iterating over. [...] the iterator returns a copy and not a view, and writing to it will have no effect.

这基本上意味着它返回一个包含该行值的新序列

因此，您得到的不是实际的行，也肯定不是数据帧

但你所做的是工作，尽管不是以你想要的方式：

df = DF(dict(a= [1,2,3], b= list("abc")))
df                 # To demonstrate what you are doing
   a  b
0  1  a
1  2  b
2  3  c

for index, row in df.iterrows():
...     print("\n         \n>>> Next Row:\n")
...     print(row)

...     row["c"] = "ADDED"           ####### HERE I am adding to 'the row'

...     print("\n   >> added:")
...     print(row)
...     print("           ")
...     
         
 Next Row:     # as you can see, this Series has the same values
a    1         # as the row that it represents
b    a
Name: 0, dtype: object  

   >> added:
a        1
b        a
c    ADDED     # and adding to it works... but you aren't doing anything 
Name: 0, dtype: object   # with it, unless you append it to a list
           


         
 Next Row:
a    2
b    b
Name: 1, dtype: object
                       ### same here
   >> added:
a        2
b        b
c    ADDED
Name: 1, dtype: object
           


         
 Next Row:
a    3
b    c
Name: 2, dtype: object
                          ### and here
   >> added:
a        3
b        c
c    ADDED
Name: 2, dtype: object

回答第二个问题：“这是好办法吗？”

没有

因为使用SeaBean所展示的乘法实际上使用了 numpy和pandas是矢量化操作。 This is a link to a good article on vectorization in numpy arrays，它们基本上是pandas数据帧和系列的构建块

相关问题更多 >

编程相关推荐

热门问题

热门文章