如何有效地从geopandas df中过滤不在形状多边形边界内的行？

geo_df['withinPolygon'] = "" withinQlist = [] for lon,lat in zip(geo_df['longitude'], geo_df['latitude']): pt = Point(lon, lat) withinQ = pt.within(grid_polygon) withinQlist.append(withinQ) geo_df['withinPolygon'] = withinQlist geo_df = geo_df[geo_df.withinPolygon==True]

1条回答

网友

1楼 · 发布于 2024-09-30 20:28:06

第一步，正如您在评论中提到的，您的代码可以简化如下：

import geopandas
geo_df = geopandas.GeoDataFrame(input_df, geometry=geopandas.points_from_xy(input_df.Longitude, input_df.Latitude)

geo_df_filtered = geo_df.loc[geo_df.within(grid_polygon)]

但是，根据您拥有的数据类型和使用模式，有几种技术可以加快速度：

使用准备好的几何图形

如果多边形非常复杂，创建prepared geometry将加快包容检查的速度。这将在开始时预先计算各种数据结构，从而加快后续操作。（更多详情here。）

from shapely.prepared import prep

grid_polygon_prep = prep(grid_polygon)
geo_df_filtered = geo_df.loc[geo_df.geometry.apply(lambda p: grid_polygon_prep.contains(p))]

（不能像上面那样执行geo_df.loc[geo_df.within(grid_polygon_prep)]，因为geopandas不支持此处准备的几何图形。）

使用空间索引

如果需要对多个grid_polygon点（而不仅仅是一个）运行给定点集的包含检查，那么在这些点上使用空间索引是有意义的。这将大大加快速度，特别是当分数很多的时候

Geopandas为此提供了^{}：

match_indices = geo_df.sindex.query(grid_polygon, predicate="contains")
# note that using `iloc` instead of `loc` is important here
geo_df_filtered = geo_df.iloc[match_indices]

很好的博客文章，还有更多的解释：https://geoffboeing.com/2016/10/r-tree-spatial-index-python/

使用准备好的几何图形

使用空间索引

相关问题更多 >

编程相关推荐

热门问题

热门文章