我正在使用geospark
的ST_Intersects
函数来生成点和多边形之间的交点
queryOverlap = """
SELECT p.ID, z.COUNTYNS as zone, p.date, timestamp, p.point
FROM gpsPingTable as p, zoneShapes as z
WHERE ST_Intersects(p.point, z.geometry)
"""
pingsDay = spark.sql(queryOverlap)
pingsDay.show()
为什么每行的返回都是重复的
+--------------------+--------+----------+-------------------+--------------------+
| ID| zone| date| timestamp| point|
+--------------------+--------+----------+-------------------+--------------------+
|45cdaabc-a804-46b...|01529224|2020-03-17|2020-03-17 12:29:24|POINT (-122.38825...|
|45cdaabc-a804-46b...|01529224|2020-03-17|2020-03-17 12:29:24|POINT (-122.38825...|
|45cdaabc-a804-46b...|01529224|2020-03-18|2020-03-18 11:21:27|POINT (-122.38851...|
|45cdaabc-a804-46b...|01529224|2020-03-18|2020-03-18 11:21:27|POINT (-122.38851...|
|aae0bb4e-4899-4ce...|01531402|2020-03-18|2020-03-18 06:58:03|POINT (-122.23097...|
|aae0bb4e-4899-4ce...|01531402|2020-03-18|2020-03-18 06:58:03|POINT (-122.23097...|
|f9b58c70-0665-4f5...|01531928|2020-03-17|2020-03-17 17:32:46|POINT (-119.43811...|
|f9b58c70-0665-4f5...|01531928|2020-03-17|2020-03-17 17:32:46|POINT (-119.43811...|
|f9b58c70-0665-4f5...|01531928|2020-03-18|2020-03-18 08:21:34|POINT (-119.41080...|
|f9b58c70-0665-4f5...|01531928|2020-03-18|2020-03-18 08:21:34|POINT (-119.41080...|
|f9b58c70-0665-4f5...|01531928|2020-03-19|2020-03-19 00:26:43|POINT (-119.43623...|
|f9b58c70-0665-4f5...|01531928|2020-03-19|2020-03-19 00:26:43|POINT (-119.43623...|
|fb768b89-b92a-4f0...|01531402|2020-03-18|2020-03-18 06:30:43|POINT (-122.22106...|
|fb768b89-b92a-4f0...|01531402|2020-03-18|2020-03-18 06:30:43|POINT (-122.22106...|
|fb768b89-b92a-4f0...|01531402|2020-03-18|2020-03-18 07:57:47|POINT (-122.22102...|
|fb768b89-b92a-4f0...|01531402|2020-03-18|2020-03-18 07:57:47|POINT (-122.22102...|
|a32f727d-566b-4ad...|01529224|2020-03-18|2020-03-18 14:38:13|POINT (-122.59499...|
|a32f727d-566b-4ad...|01529224|2020-03-18|2020-03-18 14:38:13|POINT (-122.59499...|
|ad7e4d7e-f8e5-45b...|01529224|2020-03-18|2020-03-18 07:58:51|POINT (-122.14959...|
|ad7e4d7e-f8e5-45b...|01529224|2020-03-18|2020-03-18 07:58:51|POINT (-122.14959...|
+--------------------+--------+----------+-------------------+--------------------+
最明显的原因是源表中的点或分区不是唯一的。如果存在重复的点或分区,显然会得到重复的点或分区
检查源表的唯一性:
这将报告重复的点。这将报告重复区域:
相关问题 更多 >
编程相关推荐