如何解决文本扩展中错误字符串索引必须是整数的问题

# Dictionary of English Contractions contractions_dict = {"ain't": "are not","'s":" is","aren't": "are not", "can't": "cannot","can't've": "cannot have", "'cause": "because","could've": "could have","couldn't": "could not", "couldn't've": "could not have", "didn't": "did not","doesn't": "does not", "don't": "do not","hadn't": "had not","hadn't've": "had not have", "hasn't": "has not","haven't": "have not","he'd": "he would", "he'd've": "he would have","he'll": "he will", "he'll've": "he will have", "how'd": "how did","how'd'y": "how do you","how'll": "how will", "I'd": "I would", "I'd've": "I would have","I'll": "I will", "I'll've": "I will have","I'm": "I am","I've": "I have", "isn't": "is not", "it'd": "it would","it'd've": "it would have","it'll": "it will", "it'll've": "it will have", "let's": "let us","ma'am": "madam", "mayn't": "may not","might've": "might have","mightn't": "might not", "mightn't've": "might not have","must've": "must have","mustn't": "must not", "mustn't've": "must not have", "needn't": "need not", "needn't've": "need not have","o'clock": "of the clock","oughtn't": "ought not", "oughtn't've": "ought not have","shan't": "shall not","sha'n't": "shall not", "shan't've": "shall not have","she'd": "she would","she'd've": "she would have", "she'll": "she will", "she'll've": "she will have","should've": "should have", "shouldn't": "should not", "shouldn't've": "should not have","so've": "so have", "that'd": "that would","that'd've": "that would have", "there'd": "there would", "there'd've": "there would have", "they'd": "they would"} # Regular expression for finding contractions contractions_re=re.compile('(%s)' % '|'.join(contractions_dict.keys())) # Function for expanding contractions def expand_contractions(text,contractions_dict=contractions_dict): def replace(match): return contractions_dict[match.group(0)] # Expanding Contractions in the reviews dataset['entitas bernama']=dataset['entitas bernama'].apply(lambda x:expand_contractions(x))

1条回答

网友

1楼 · 发布于 2024-09-29 01:35:46

这就是如何替换pandas中的系列值

pandas.Series.replace(to_replace=contractions_dict, inplace=True, value=None, regex=True)

从https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.replace.html：

Dicts can be used to specify different replacement values for different existing values. For example, {'a': 'b', 'y': 'z'} replaces the value 'a' with 'b' and 'y' with 'z'. To use a dict in this way the value parameter should be None.

范例

contraction_dict = {...} # redacted

In []: twt = pd.read_csv('twitter4000.csv')
Out[]:
                                                      tweets    sentiment
       0    is bored and wants to watch a movie any sugge...    0
       1.               back in miami. waiting to unboard ship  0
       2    @misskpey awwww dnt dis brng bak memoriessss, ...   0
       3                    ughhh i am so tired blahhhhhhhhh    0
       4    @mandagoforth me bad! It's funny though. Zacha...   0
    ...     ...     ...
    3995                                    i just graduated    1
    3996            Templating works; it all has to be done     1
    3997                    mommy just brought me starbucks     1
    3998    @omarepps watching you on a House re-run...lov...   1
    3999    Thanks for trying to make me smile I'll make y...   1

    4000 rows × 2 columns

# notice in a glance only the last row has contraction in head +5 tail -5

In []: # check which rows has contractions
       twt[twt.tweets.str.contains('|'.join(contractions_dict.keys()), regex=True)]
Out[]:
                                                       tweets   sentiment
       2    @misskpey awwww dnt dis brng bak memoriessss, ...   0
       4    @mandagoforth me bad! It's funny though. Zacha...   0
       5    brr, i'm so cold. at the moment doing my assig...   0
       6    @kevinmarquis haha yep but i really need to sl...   0
       7    eating some ice-cream while I try to see @pete...   0
    ...     ...     ...
    3961                                gonna cousin's b.day.   1
    3968    @kat_n Got to agree it's a risk to put her thr...   1
    3983    About to watch the Lakers win game duece. I'm ...   1
    3986    @countroshculla yeah..needed to get up early.....   1
    3999    Thanks for trying to make me smile I'll make y...   1

    937 rows × 2 columns

In []: twt.tail(5).tweets.replace(to_replace=contractions_dict, value=None, regex=True)

Out[]:
    3995                                    i just graduated 
    3996            Templating works; it all has to be done  
    3997                     mommy just brought me starbucks 
    3998    @omarepps watching you on a House re-run...lov...
    3999    Thanks for trying to make me smile I will make...

    Name: tweets, dtype: object

为Series.replace使用参数inplace=True以避免重新分配给df，即twt.tweets = twt.tweets.replace(...)

相关问题更多 >

编程相关推荐

热门问题

热门文章