根据其他列的列表从列中删除子字符串

tweet | url -----------------------------------|------------------ "Hello World url1 url12" | [url1, url12] "Good morning url2 engine url41" | [url2, url41] "Nice to meet you url3 " | [url3] "You are fantastic " | []

tweet | url | urls_free_tweet -----------------------------------|----------------------|--------------------- "Hello World url1 url12" | [url1, url12] | "Hello World "Good morning url2 engine url41" | [url2, url41] | "Good morning engine" "Nice to meet you url3 " | [url3] | "Nice to meet you" "You are fantastic " | [] | "You are fantastic "

3条回答

网友

1楼 · 编辑于 2024-10-05 10:16:07

您可以将列表作为带有|符号的字符串连接起来，并用re.sub替换逐行，因为您需要将每列的每一行中的值相互比较。要进行矢量化，只有当您可以拥有整个数据帧的所有URL的列表并以这种方式一次替换所有URL时，它才会起作用，但我不确定这是否是什么——因此可能需要逐行比较：

输入：

import pandas
import re
df = pd.DataFrame({'tweet': {0: 'Hello World url1 url12',
  1: 'Good morning url2 engine url41',
  2: 'Nice to meet you url3         ',
  3: 'You are fantastic '},
 'url': {0: ['url1', 'url12 '], 1: ['url2', 'url41'], 2: ['url3'], 3: [' ']}})

代码：

df['url'] = df['url'].str.join('|').str.strip()
df['tweet'] = (df.apply(lambda x: re.sub(x['url'], '', x['tweet']), axis=1)
                 .str.strip().str.replace('\s+', ' ', regex=True))

输出：

df
Out[1]: 
                        tweet         url
0              Hello World 2  url1|url12
1       Good morning engine    url2|url41
2  Nice to meet you                  url3
3          You are fantastic

如果希望url在末尾作为列表返回，则可以执行df['url'] = df['url'].str.split('|')

网友

2楼 · 编辑于 2024-10-05 10:16:07

目前还不清楚您的数据是否以列表、数据帧或其他形式出现。因此，我将向您展示一个解决方案，该解决方案可以用于任何一种情况，只需对数据进行迭代，并在每次对数据进行迭代时使用以下逻辑。用当前迭代的数据值替换str和url变量

str = "Hello world url1 url2"
url = "url1 url2"
removed = str.split(url)
final = ("").join(removed).strip()

网友

3楼 · 编辑于 2024-10-05 10:16:07

我认为它可以简单化如下：

df['tweet'].str.replace('url.*', '')
0          Hello World
1         Good morning
2     Nice to meet you
3    You are fantastic

或

df['urls_free_tweet'] = df['tweet'].str.replace('url.*', '')
print(df)
                            tweet             url    urls_free_tweet
0          Hello World url1 url12  [url1, url12 ]        Hello World
1  Good morning url2 engine url41   [url2, url41]       Good morning
2  Nice to meet you url3                   [url3]   Nice to meet you
3              You are fantastic              [ ]  You are fantastic

相关问题更多 >

编程相关推荐

热门问题

热门文章