pandas/python：拆分url并向数据库添加列

2条回答

网友

1楼 · 编辑于 2024-06-26 02:53:39

使用纯Python：

data= [
    'w.lejournal.fr/actualite/politique/sarkozy-terminator_1557749.html',
    'w.lejournal.fr/palmares/palmares-immobilier/',
    'w.lejournal.fr/actualite/societe/adeline-hazan-devient-la-nouvelle-controleuse-des-lieux-de-privation-de-liberte_1558176.html'
]

result = []

for x in data:
    cols = x.split('/')
    result.append( [x, cols[1], cols[2]] )

print result

一。在

^{pr2}$

你只需要读写数据库。在

如果所有URL都以http://开头，那么您将需要获得cols[3]，cols[4]

data= [
    'http://w.lejournal.fr/actualite/politique/sarkozy-terminator_1557749.html',
    'http://w.lejournal.fr/palmares/palmares-immobilier/',
    'http://w.lejournal.fr/actualite/societe/adeline-hazan-devient-la-nouvelle-controleuse-des-lieux-de-privation-de-liberte_1558176.html'
]

result = []

for x in data:
    cols = x.split('/')
    result.append( [x, cols[3], cols[4]] )

print result

网友

2楼 · 编辑于 2024-06-26 02:53:39

不需要熊猫，regex可以非常有效地做到这一点：

import re

ts = ['w.lejournal.fr/actualite/politique/sarkozy-terminator_1557749.html',
    'w.lejournal.fr/palmares/palmares-immobilier/',
    'w.lejournal.fr/actualite/societe/adeline-hazan-devient-la-nouvelle-controleuse-des-lieux-de-privation-de-liberte_1558176.html']

rgx = r'(?<=w.lejournal.fr/)([aA-zZ]*)/([aA-zZ_-]*)(?=/)'

for url_address in ts:
    found_group = re.findall(rgx, url_address)
    for item in found_group:
        print item

这是它返回的结果：

^{pr2}$

当然，您不需要在URL列表上执行此操作

相关问题更多 >

编程相关推荐

热门问题

热门文章

pandas/python：拆分url并向数据库添加列

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >