使用python从“Samuel L.JacksonJessica BielBrian Presley50 CentChristina RicciChad Michael Murray”中提取文本

2024-06-15 01:59:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样的绳子

"Samuel L. JacksonJessica BielBrian Presley50 CentChristina RicciChad Michael Murray"

我想要这样

Samuel L. Jackson,
Jessica Biel,
Brian Presley,
50 Cent,
Christina Ricci,
Chad Michael,
Murray,

使用python


Tags: centsamuelbrianmichael绳子jacksonjessicamurray
1条回答
网友
1楼 · 发布于 2024-06-15 01:59:12

在熊猫中,你可以这样做:

import pandas as pd

a= pd.Series("Samuel L. JacksonJessica BielBrian Presley50 CentChristina RicciChad Michael Murray").str.replace(r'([a-z])([A-Z0-9])', r'\1,\2')
a.to_list()[0]

# 'Samuel L. Jackson,Jessica Biel,Brian Presley,50 Cent,Christina Ricci,Chad Michael Murray' 

或者

a = pd.Series("Samuel L. JacksonJessica BielBrian Presley50 CentChristina RicciChad Michael Murray").str.replace(r'([a-z])([A-Z0-9])', r'\1,\n\2')                                              

print(a.to_list()[0])  

输出

Samuel L. Jackson,
Jessica Biel,
Brian Presley,
50 Cent,
Christina Ricci,
Chad Michael Murray

你的意思是:

import requests
import csv
from bs4 import BeautifulSoup

link='https://en.wikipedia.org/wiki/Home_of_the_Brave_(2006_film)'

result1 = requests.get(link)
src1 = result1.content
soup = BeautifulSoup(src1,'lxml')
table = soup.find_all('ul')[3]
names = table.find_all('a')
for item in names:
   print(item.text)

输出:

Samuel L. Jackson
Jessica Biel
Brian Presley
50 Cent
Chad Michael Murray
Christina Ricci
Victoria Rowell
Vyto Ruginis

相关问题 更多 >