文本数据的logistic回归分析

2024-10-04 01:28:16 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在对这些数据进行预处理：

Name   Nickname   Age    Country    Reg_Date     Text 
Matt   LeBron    63     Canada     24-12-2008   I'm in a happy mood today. I go to beach
Chris  Severine  54     U.S.       15-07-2009   I stand in solidarity with #ows
Lucas  Daly      47     Ireland    01-05-2020   Trump is working for next politician...
Clash  Lynch     24     U.S.       13-11-2008   What a wonderful day!
...

我需要的是在将数据集拆分为训练集和测试集并用于逻辑回归之前，使用单词包或其他特征表示

目前，我试图从上面的原始数据集中获取其他信息（tweet中的字符数；标点符号的使用，等等）：

Name   Nickname   Age    Country    Reg_Date     Text 
Matt   LeBron    63     Canada     24-12-2008   I'm in a happy mood today. I go to beach
Chris  Severine  54     U.S.       15-07-2009   I stand in solidarity with #ows
Lucas  Daly      47     Ireland    01-05-2020   Trump is working with Putin... 
Clash  Lynch     24     U.S.       13-11-2008   What a wonderful day!
...
Lulu   Lulu22    18     Poland     02-09-2019   I hate Maths!!!! >(


Punctuation   Positive Words     Negative Words
[.]          [happy]              []
[#]          [solidarity]         []
[...]        []                   []
[!]          [wonderful]          []
[>,(]        []                   [hate]

现在，我真的很想了解如何以一种模型（例如在逻辑回归模型中）“可读”的方式转换标点符号信息、肯定词、否定词和文本

如果您能给我一些有用的提示或提供一个例子，我将不胜感激

Tags：数据 text name in age date with nickname

1条回答

网友

1楼 · 发布于 2024-10-04 01:28:16

使用One hot encoding 或word embedding

有关nlp的更多信息，请阅读Stanford's cs224N course中的注释。更具体地说this

文本数据的logistic回归分析

相关问题更多 >

编程相关推荐

热门问题

热门文章

文本数据的logistic回归分析

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >