数据科学图书馆
nami的Python项目详细描述
纳米语
nami是python包
安装
pip install --upgrade nami
特性(nami-1.2.0.1)
获取数据集
fromnami.datasets.ImageNetimportget_datasetdataset=get_dataset(noun='STR',dimension=(INT,INT),max=INT,timeout=FLOAT,save_to='STR')
get 'INT*INT dimenstion' of 'noun' image dataset from ImageNet.
timeout - [from 0.1 - 1.0] maximum time request for each image URL. max - number of images dataset. save_to - save the dataset by '.npy' format.
加载数据集(KME)
^{pr2}$标记器类
fromnami.AI.kme_tokenizeimportTokenizertokenizer=Tokenizer()text_arr=['methyl methanoate','ethane','(hydroxymethylamino)oxy-methoxymethanol']```pythontokenizer.fit_on_text(text_arr)
fit_on_text(sentences=)
sentences: take array of string to make bag of words (word2index & index2word)
train_seq=tokenizer.text_to_sequences(text_arr,method_pad='pre')
text_to_sequences(sentences= , method_pad='post')
sentences: take array of string to preprocessing text to numeric
method_pad: ('pre', 'post') make zero padding
train_seq
[[ 0 0 0 0 0 0 0 4 5 6 4 7 8]
[ 0 0 0 0 0 0 0 0 0 0 0 9 10]
[11 12 4 5 13 14 15 16 4 15 4 7 17]]
test_arr=['2-(4-methoxyphenyl)-2-oxoacetic acid']test_seq=tokenizer.text_to_sequences(test_arr)# [[11, 14, 18, 13, 14, 4, 22, 3, 5, 21, 14, 11, 14, 3, 3, 3]]test_text=tokenizer.sequences_to_text(test_seq)# [['2', '-', '(', '4', '-', 'meth', 'oxy', '<unk>', 'yl', ')', '-', '2', '-', '<unk>', '<unk>', '<unk>']]
- 项目
标签: