基于Pandas DataFram中其他两列值的条件选择的新列

Traceback (most recent call last): File "<pyshell#116>", line 1, in <module> Data[1]['Test'] =Data[1]['Close'] if Data[1]['Close'] > Data[1]['Open'] else Data[1]['Open'] ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

3条回答

网友

1楼 · 编辑于 2024-06-01 10:05:42

问题是，您要求python计算一个包含多个布尔值的条件（Data['Close'] > Data['Open']）。您不想使用any或all，因为这会将Data['Test']设置为Data['Open']或Data['Close']。

可能有更干净的方法，但有一种方法是使用掩码（布尔数组）：

mask = Data['Close'] > Data['Open']
Data['Test'] = pandas.concat([Data['Close'][mask].dropna(), Data['Open'][~mask].dropna()]).reindex_like(Data)

网友

2楼 · 编辑于 2024-06-01 10:05:42

从如下数据帧：

>>> df
         Date   Open   High    Low  Close   Volume  Adj Close
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04

我能想到的最简单的事情是：

>>> df["Test"] = df[["Open", "Close"]].max(axis=1)
>>> df
         Date   Open   High    Low  Close   Volume  Adj Close   Test
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23

df.ix[:,["Open", "Close"]].max(axis=1)可能会快一点，但我觉得看起来不太好。

或者，可以对行使用.apply：

>>> df["Test"] = df.apply(lambda row: max(row["Open"], row["Close"]), axis=1)
>>> df
         Date   Open   High    Low  Close   Volume  Adj Close   Test
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23

或者回到纽比：

>>> df["Test"] = np.maximum(df["Open"], df["Close"])
>>> df
         Date   Open   High    Low  Close   Volume  Adj Close   Test
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23

基本问题是if/else不能很好地处理数组，因为if (something)总是将something强制为单个bool。它不等同于“对于数组中的每个元素，如果条件成立的话”或类似的东西。

网友

3楼 · 编辑于 2024-06-01 10:05:42

In [7]: df = DataFrame(randn(10,2),columns=list('AB'))

In [8]: df
Out[8]: 
          A         B
0 -0.954317 -0.485977
1  0.364845 -0.193453
2  0.020029 -1.839100
3  0.778569  0.706864
4  0.033878  0.437513
5  0.362016  0.171303
6  2.880953  0.856434
7 -0.109541  0.624493
8  1.015952  0.395829
9 -0.337494  1.843267

这是一个where-conditional，意思是给我a的值if a>；B，否则给我B

# this syntax is EQUIVALENT to
# df.loc[df['A']>df['B'],'A'] = df['B']

In [9]: df['A'].where(df['A']>df['B'],df['B'])
Out[9]: 
0   -0.485977
1    0.364845
2    0.020029
3    0.778569
4    0.437513
5    0.362016
6    2.880953
7    0.624493
8    1.015952
9    1.843267
dtype: float64

在这种情况下max是等价的

In [10]: df.max(1)
Out[10]: 
0   -0.485977
1    0.364845
2    0.020029
3    0.778569
4    0.437513
5    0.362016
6    2.880953
7    0.624493
8    1.015952
9    1.843267
dtype: float64

相关问题更多 >

编程相关推荐

热门问题

热门文章