创建字符串列表而不是单词列表

[{"is_sarcastic": 1, "headline": "thirtysomething scientists unveil doomsday clock of hair loss", "article_link": "https://www.theonion.com/thirtysomething-scientists-unveil-doomsday-clock-of-hai-1819586205"}, {"is_sarcastic": 0, "headline": "dem rep. totally nails why congress is falling short on gender, racial equality", "article_link": "https://www.huffingtonpost.com/entry/donna-edwards-inequality_us_57455f7fe4b055bb1170b207"} ]

stop_words = ["a", "about", "above", "after", "again", "..."] _corpus, _result = [], [] for text in data: text_clean = [word for word in re.split('\W+', text['headline'])if word.lower() not in stop_words and len(word) > 2] _corpus.append(' '.join(text_clean)) _result.append(text['is_sarcastic'])

_corpus, _result = map(list, zip( *[(''.join(word), text['is_sarcastic']) for text in data for word in re.split('\W+', text['headline']) if word.lower() not in stop_words and len(word) > 2]))

1条回答

网友

1楼 · 发布于 2024-09-29 02:22:37

要将以下代码段转换为列表：

stop_words = ["a", "about", "above", "after", "again", "..."]
_corpus, _result = [], []
for text in data:
    text_clean = [word for word in re.split('\W+', text['headline']) if word.lower() not in stop_words and len(word) > 2]
    _corpus.append(' '.join(text_clean))
    _result.append(text['is_sarcastic'])

这不是一个好主意，因为代码已经不容易阅读了！您应该从函数开始：

def clean(headline):
    return [word for word in re.split('\W+', headline) if word.lower() not in stop_words and len(word) > 2]

_corpus, _result = [], []
for text in data:
    _corpus.append(' '.join(clean(text['headline'])))
    _result.append(text['is_sarcastic'])

如果需要列表理解，请使用单个列表来存储对：

_ret = []
for text in data:
    _ret.append((' '.join(clean(text['headline'])), text['is_sarcastic']))
# [('thirtysomething scientists unveil doomsday clock hair loss', 1), ('dem rep totally nails why congress falling short gender racial equality', 0)]

这个循环很容易转换成一个列表。为了得到结果，zip需要重新创建两个元组的元素：

_corpus, _result = zip(*_ret)
# ('thirtysomething scientists unveil doomsday clock hair loss', 'dem rep totally nails why congress falling short gender racial equality') (1, 0)

或者，就像你所做的：

_corpus, _result = map(list, zip(*_ret))
# ['thirtysomething scientists unveil doomsday clock hair loss', 'dem rep totally nails why congress falling short gender racial equality'] [1, 0]

完整代码：

import re

stop_words = ["a", "about", "above", "after", "again", "..."]
_ret = [(' '.join(clean(text['headline'])), text['is_sarcastic']) for text in data]
_corpus, _result = map(list, zip(*_ret))
print (_corpus, _result)
# ['thirtysomething scientists unveil doomsday clock hair loss', 'dem rep totally nails why congress falling short gender racial equality'] [1, 0]

离你写的不远，但是text['is_sarcastic']放错地方了。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章