从ndjson提取全部（或替换）无效

2024-09-30 22:18:54 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在读取一个文件，每行有一个JSON对象（ndjson）

dfjson = pd.read_json(path_or_buf=JsonFicMain,orient='records',lines=True)

下面是一个dataframe内容的两行示例（删除列后）

              nomCommune  codeCommune numeroComplet                    nomVoie  codePostal                                                                                            meilleurePosition    codesParcelles
0        Ablon-sur-Seine        94001            21        Rue Robert Schumann       94480  {'type': 'parcelle', 'geometry': {'type': 'Point', 'coordinates': [2.411247955172414, 48.726054248275865]}}  [94001000AG0013]
1        Ablon-sur-Seine        94001            13        Rue Robert Schumann       94480   {'type': 'parcelle', 'geometry': {'type': 'Point', 'coordinates': [2.412065866666666, 48.72614911111111]}}  [94001000AG0020]

它包含数百万行，我想提取一个地理坐标，在方括号之间，在一个特定的列（名为meilleurePosition）中。预期产量为

[2.411247955172414, 48.726054248275865]

我试图提取坐标或替换所有其他不需要的字符使用extractall或extract不匹配

test=dfjson['meilleurePosition'].str.extract(pat='(\d+\.\d+)')
test2=dfjson['meilleurePosition'].str.extractall(pat='(\d+\.\d+)')
Empty DataFrame
Columns: [0]
Index: []

使用replace或str.replace无效

test3=dfjson["meilleurePosition"].replace(to_replace=r'[^0-9.,:]',value='',regex=True)
0       {'type': 'parcelle', 'geometry': {'type': 'Point', 'coordinates': [2.411247955172414, 48.726054248275865]}}
1        {'type': 'parcelle', 'geometry': {'type': 'Point', 'coordinates': [2.412065866666666, 48.72614911111111]}}

即使是none正则表达式类型也不起作用

test4=dfjson['meilleurePosition'].str.replace('type','whatever')
0      NaN
1      NaN

print(test)

我试图找出这根本不起作用的原因

列类型为“object”（这显然很好，因为这是一个（字符串）
使用inplace=True而不复制数据帧将导致类似结果

为什么我不能操纵这个专栏，是因为它的特殊字符吗？如何以良好的格式获取这些坐标

好的，经过进一步的调查，该列包含一个嵌套的dict，这就是它不起作用的原因这个答案对我帮助很大 python pandas use map with regular expressions 然后，我使用以下代码创建了一个具有预期坐标的新列

def extract_coord(meilleurepositiondict):
    if isinstance(meilleurepositiondict,dict) :
        return meilleurepositiondict['geometry']['coordinates']
    else :
        return None

dfjson['meilleurePositionclean']=dfjson['meilleurePosition'].apply(lambda x: extract_coord(x))

Tags： true type extract replace point geometry str coordinates

1条回答

网友

1楼 · 发布于 2024-09-30 22:18:54

我使用下面的代码找到了解决方案

dfjson['meilleurePosition']=dfjson['meilleurePosition'].apply(lambda x: extract_coord(x) if x == x else defaultmeilleurepositionvalue)

这是必需的，因为空行导致错误（未被捕获在函数定义中）。然而，我仍然相信有很多简单的方法可以将列的dict值分配给列本身，仍然在尝试

从ndjson提取全部（或替换）无效

相关问题更多 >

编程相关推荐

热门问题

热门文章

从ndjson提取全部（或替换）无效

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >