python实用程序包
chitti的Python项目详细描述
python的实用函数
pip install chitti
漂亮的印花
from chitti import pprint, pprint_nl
brands = ['apple', 'samsung', 'pixel', 'one plus']
pprint(brands)
OUT:
apple
samsung
pixel
one plus
pprint_nl(brands)
OUT:
apple
samsung
pixel
one plus
文本中的颜色词
from chitti import color_words_in_text
text = 'camera is awesome and battery is good'
words = ['camera', 'battery']
color_words_in_text(text, words, color='green', text_color='white')
培训和验证拆分
将数据帧拆分为train和val数据帧
将每个类别分为80%列车和20%VAL
from chitti.train_test_split import train_val_split
path = 'data.csv'
df = pd.read_csv(path)
text_col='Article_clean'
target_col='NewsType'
train_df, val_df = train_val_split(df, text_col=text_col, target_col=target_col, size=0.8)
print(train_df[target_col].value_counts())
print(val_df[target_col].value_counts())
下载预训练的词向量
支持的向量:
- 手套.6b.50d
- 手套.6b.100d
- 手套6b.200d
- 手套.6b.300d
- 手套.42b.300d
- 手套.840b.300d
- glow.twitter.25d
- glove.twitter.50d
- glove.twitter.100d
- glove.twitter.200d
这将下载指定的矢量并创建两个文件
- word_index.pkl=>;word2index字典
- embedding_matrix.npy=>;大小的numpy矩阵(vocab_size,embedding_size)
from chitti.nlp import download_pretrained_vectors, download_pretrained_vectors_
download_pretrained_vectors('GloVe.6B.50d')
download_pretrained_vectors_('glove.6B.50d.txt')
文本清理实用程序
from chitti.nlp import stem_words, lemmatize_words
from chitti.nlp import remove_punctuation, remove_stopwords, space_punctuation
text = 'i, love. you ..... ,,, !!! ?? ?> >> '
print(remove_punctuation(text))
OUT:
'i love you'