查找PySp中每行的最新非空值

2024-09-29 22:28:35 发布

男 | 程序猿一只，喜欢编程写python代码。

我有一个像这样的Pypark数据框

+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+
|id        |201806|201807|201808|201809|201810|201811|201812|201901|201902|201903|201904|201905|201906|
+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+
|  1       |    15|    15|    15|    15|    15|    15|    15|    15|    15|  null|    15|    15|    15|
|  2       |     4|     4|     4|     4|     4|     4|     4|     4|     4|     4|     4|     4|     4|
|  3       |     7|     7|     7|     7|     7|     7|     7|     7|  null|  null|  null|  null|  null|
-------------------------------------------------------------------------------------------------------

我想从这些数据中找出每一行最新的非空值。你知道吗

我期望得到以下结果。你知道吗

+----------+------+
|id.         |latest|
+----------+------+
|  1       |    15| 
|  2       |     4|  
|  3       |     7|  
-------------------

我遵循了这个answer，但是我不能对每行执行操作。你知道吗

我用过

df.select([last(x, ignorenulls=True).alias(x) for x in df.columns])

但这段代码只按列执行，我希望按行执行相同的操作。你知道吗

Tags：数据 answer in id true df for alias

1条回答

网友

1楼 · 发布于 2024-09-29 22:28:35

假设您的列是从最早到最新排序的，您可以使用下面的代码使用coalesce来获取最新的值。你知道吗

from pyspark.sql.functions import coalesce

df.select('id', coalesce(*[i for i in df.columns[::-1] if i != 'id']).alias('latest')).show()

输出：

+ -+   +
| id|latest|
+ -+   +
|  1|    15|
|  2|     4|
|  3|     7|
+ -+   +

查找PySp中每行的最新非空值

相关问题更多 >

编程相关推荐

热门问题

热门文章

查找PySp中每行的最新非空值

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >