使用geopandas从HDFS读取shapefile

2024-10-04 11:21:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我的HDFS上有一个shapefile,我想用geopandas(版本0.8.1)将它导入我的Jupyter笔记本中。
我尝试了标准的^{}方法,但它不识别HDFS目录;相反,我相信它会在本地目录中搜索,因为我使用本地目录进行了测试,并正确读取了shapefile

这是我使用的代码:

import geopandas as gpd

shp = gpd.read_file('hdfs://hdfsha/my_hdfs_directory/my_shapefile.shp')

我得到的错误是:

---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
fiona/_shim.pyx in fiona._shim.gdal_open_vector()

fiona/_err.pyx in fiona._err.exc_wrap_pointer()

CPLE_OpenFailedError: hdfs://hdfsha/my_hdfs_directory/my_shapefile.shp: No such file or directory

During handling of the above exception, another exception occurred:

DriverError                               Traceback (most recent call last)
<ipython-input-17-3118e740e4a9> in <module>
----> 2 shp = gpd.read_file('hdfs://hdfsha/my_hdfs_directory/my_shapefile.shp' class="ansi-blue-fg">)
      3 print(shp.shape)
      4 shp.head(3)

/opt/venv/geocoding/lib/python3.6/site-packages/geopandas/io/file.py in _read_file(filename, bbox, mask, rows, **kwargs)
     94 
     95     with fiona_env():
---> 96         with reader(path_or_bytes, **kwargs) as features:
     97 
     98             # In a future Fiona release the crs attribute of features will

/opt/venv/geocoding/lib/python3.6/site-packages/fiona/env.py in wrapper(*args, **kwargs)
    398     def wrapper(*args, **kwargs):
    399         if local._env:
--> 400             return f(*args, **kwargs)
    401         else:
    402             if isinstance(args[0], str):

/opt/venv/geocoding/lib/python3.6/site-packages/fiona/__init__.py in open(fp, mode, driver, schema, crs, encoding, layer, vfs, enabled_drivers, crs_wkt, **kwargs)
    255         if mode in ('a', 'r'):
    256             c = Collection(path, mode, driver=driver, encoding=encoding,
--> 257                            layer=layer, enabled_drivers=enabled_drivers, **kwargs)
    258         elif mode == 'w':
    259             if schema:

/opt/venv/geocoding/lib/python3.6/site-packages/fiona/collection.py in __init__(self, path, mode, driver, schema, crs, encoding, layer, vsi, archive, enabled_drivers, crs_wkt, ignore_fields, ignore_geometry, **kwargs)
    160             if self.mode == 'r':
    161                 self.session = Session()
--> 162                 self.session.start(self, **kwargs)
    163             elif self.mode in ('a', 'w'):
    164                 self.session = WritingSession()

fiona/ogrext.pyx in fiona.ogrext.Session.start()

fiona/_shim.pyx in fiona._shim.gdal_open_vector()

DriverError: hdfs://hdfsha/my_hdfs_directory/my_shapefile.shp: No such file or directory

所以,我想知道是否真的可以用geopandas读取存储在HDFS中的shapefile。如果是,如何进行


Tags: inselfifmodemyhdfsdirectorykwargs
1条回答
网友
1楼 · 发布于 2024-10-04 11:21:57

如果有人还在寻找这个问题的答案,我设法找到了一个解决办法

首先,需要一个.zip文件,其中包含与shapefile(.shp、.shx、.dbf、…)相关的所有数据。然后,我们使用pyarrow建立到HDFS的连接,并使用fiona读取压缩的shapefile

我正在使用的软件包版本:

  • pyarrow==2.0.0
  • fiona==1.8.18

守则:

# import packages
import pandas as pd
import geopandas as gpd
import fiona
import pyarrow

# establish a connection to HDFS
fs = pyarrow.hdfs.connect()

# read zipped shapefile
with fiona.io.ZipMemoryFile(fs.open('hdfs://my_hdfs_directory/my_zipped_shapefile.zip')) as z:
    with z.open('my_shp_file_within_zip.shp') as collection:
        gdf = gpd.GeoDataFrame.from_features(collection)
        print(gdf.shape)

相关问题 更多 >