从列表中删除以某些表达式开头的字符串

2024-10-03 19:25:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个与twitter标签相关联的字符串列表。我想删除以某些前缀开头的整个字符串。你知道吗

例如:

testlist = ['Just caught up with #FlirtyDancing. Just so cute! Loved it. ', 'After work drinks with this one @MrLukeBenjamin no dancing tonight though @flirtydancing @AshleyBanjo #FlirtyDancing pic.twitter.com/GJpRUZxUe8', 'Only just catching up and @AshleyBanjo you are gorgeous #FlirtyDancing', 'Loved working on this. Always a pleasure getting to assist the wonderful @kendrahorsburgh on @ashleybanjogram wonderful new show !! #flirtydancing pic.twitter.com/URMjUcgmyi', 'Just watching #FlirtyDancing & \n@AshleyBanjo what an amazing way to meet someone.. It made my heart all warm & fuzzy for these people! both couples meet back up.. pic.twitter.com/iwCLRmAi5n',]

我想删除图片URL、标签和@

到目前为止,我已经尝试了一些方法,即使用startswith()方法和replace()方法。你知道吗

例如:

prefixes = ['pic.twitter.com', '#', '@']
bestlist = []

for line in testlist:
    for word in prefixes:
        line = line.replace(word,"")
        bestlist.append(line)

这似乎摆脱了pic.twitter.com,但不是URL末尾的一系列字母和数字。这些字符串是动态的,每次都有一个不同的结束URL…这就是为什么如果它们以那个前缀开头,我想去掉整个字符串。你知道吗

我也尝试过标记所有内容,但是replace()仍然无法摆脱整个单词:

import nltk 

for line in testlist:
tokens = nltk.tokenize.word_tokenize(line)
for token in tokens:
    for word in prefixes:
        if token.startswith(word):
            token = token.replace(word,"")
            print(token)

我开始对startswith()方法和replace()方法失去希望,我觉得用这两种方法我可能找错了方向。你知道吗

有没有更好的办法?如何删除以#、@、和开头的所有字符串图:推特?你知道吗


Tags: 方法字符串incomtokenforlinetwitter
3条回答
prefixes = {'pic.twitter.com', '#', '@'} # use sets for faster lookups

def clean_tweet(tweet):
    return " ".join(for word in line.split() if (word[:15] not in prefixes) or (word[0] not in prefixes))

或者看看:

https://www.nltk.org/api/nltk.tokenize.html

TweetTokenizer可以解决很多问题。你知道吗

此解决方案不使用regex或任何其他导入。你知道吗

prefixes = ['pic.twitter.com', '#', '@']
testlist = ['Just caught up with #FlirtyDancing. Just so cute! Loved it. ', 'After work drinks with this one @MrLukeBenjamin no dancing tonight though @flirtydancing @AshleyBanjo #FlirtyDancing pic.twitter.com/GJpRUZxUe8', 'Only just catching up and @AshleyBanjo you are gorgeous #FlirtyDancing', 'Loved working on this. Always a pleasure getting to assist the wonderful @kendrahorsburgh on @ashleybanjogram wonderful new show !! #flirtydancing pic.twitter.com/URMjUcgmyi', 'Just watching #FlirtyDancing & \n@AshleyBanjo what an amazing way to meet someone.. It made my heart all warm & fuzzy for these people! both couples meet back up.. pic.twitter.com/iwCLRmAi5n',]


def iter_tokens(line):
    for word in line.split():
        if not any(word.startswith(prefix) for prefix in prefixes):
            yield word

for line in testlist:
    row = list(iter_tokens(line))
    print(' '.join(row))

这将产生以下结果:

python test.py 
Just caught up with Just so cute! Loved it.
After work drinks with this one no dancing tonight though
Only just catching up and you are gorgeous
Loved working on this. Always a pleasure getting to assist the wonderful on wonderful new show !!
Just watching & what an amazing way to meet someone.. It made my heart all warm & fuzzy for these people! both couples meet back up..

可以使用正则表达式指定要替换的词的类型并使用^{}

import re

testlist = ['Just caught up with #FlirtyDancing. Just so cute! Loved it. ', 'After work drinks with this one @MrLukeBenjamin no dancing tonight though @flirtydancing @AshleyBanjo #FlirtyDancing pic.twitter.com/GJpRUZxUe8', 'Only just catching up and @AshleyBanjo you are gorgeous #FlirtyDancing', 'Loved working on this. Always a pleasure getting to assist the wonderful @kendrahorsburgh on @ashleybanjogram wonderful new show !! #flirtydancing pic.twitter.com/URMjUcgmyi', 'Just watching #FlirtyDancing & \n@AshleyBanjo what an amazing way to meet someone.. It made my heart all warm & fuzzy for these people! both couples meet back up.. pic.twitter.com/iwCLRmAi5n',]
regexp = r'pic\.twitter\.com\S+|@\S+|#\S+'

res = [re.sub(regexp, '', sent) for sent in testlist]
print(res)

输出

Just caught up with  Just so cute! Loved it. 
After work drinks with this one  no dancing tonight though    
Only just catching up and  you are gorgeous 
Loved working on this. Always a pleasure getting to assist the wonderful  on  wonderful new show !!  
Just watching  & 
 what an amazing way to meet someone.. It made my heart all warm & fuzzy for these people! both couples meet back up.. 

相关问题 更多 >