擅长:python、mysql、java
<pre><code>prefixes = {'pic.twitter.com', '#', '@'} # use sets for faster lookups
def clean_tweet(tweet):
return " ".join(for word in line.split() if (word[:15] not in prefixes) or (word[0] not in prefixes))
</code></pre>
<p>或者看看:</p>
<p><a href="https://www.nltk.org/api/nltk.tokenize.html" rel="nofollow noreferrer">https://www.nltk.org/api/nltk.tokenize.html</a></p>
<p>TweetTokenizer可以解决很多问题。你知道吗</p>