为每个字符串列表添加特定字符在python中问题的回答

为每个字符串列表添加特定字符在python中

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我写了一个脚本，基本上把一个句子中的所有字符串分成几个部分 例如： <pre><code>"geldigim" -> "gel" "di" "g" "i" "m" </code></pre> 有些字符串可以按上述方式拆分，有些字符串可以按以下方式拆分 <pre><code>"bildi" > "bil" "di" </code></pre> 或者有些句子可能根本就不分开。你知道吗 <pre><code>"kos" -> "kos" </code></pre> 它完全是由一个函数决定的，这个函数将字符串分成几个部分。你知道吗 我想做的是： <pre><code>geldigim -> /gel* *di* *g* *i* *m/ bildi -> /bil* *di/ kos -> /kos/ </code></pre> 我所做的是 我有一个语料库，有37251512个句子。我写了以下脚本 <pre><code>if __name__ == "__main__": io = morfessor.MorfessorIO() print "Importing corpus ..." f = codecs.open("corpus/corpus_tr_en/corpus.tr", encoding="utf-8").readlines() print "Importing morphology model ..." model = io.read_binary_model_file('seg/tr/model.bin') corpus = open('dataset/dataset_tr_en/full_segmented.tr', 'w') for a in range(len(f)): print str(a) + ' : ' + str(len(f)) words = f[a].replace('\n', '').split() line_str = '' for word in words: segmentation = model.viterbi_segment(word)[0] if len(segmentation) == 1: line_str = '/' + segmentation[0] + '/' if len(segmentation) == 2: line_str = '/' + segmentation[0] + '* *' + segmentation[1] + '/' if len(segmentation) > 2: line_str = '' for b in range(len(segmentation)): if (b == 0): line_str = line_str + '/' + segmentation[b] + '*' if (b != 0) and (b != (len(segmentation) - 1)): line_str = line_str + ' *' + segmentation[b] + '* ' if (b == (len(segmentation) - 1)): line_str = line_str + ' *' + segmentation[b] + '/' line_str = line_str + ' ' corpus.write(line_str.encode('utf-8')) corpus.write('\n') corpus.close() </code></pre> 这个脚本在每个句子和句子中的每个单词上循环，并使用<code>io.read_binary_model_file</code>函数将其拆分为多个部分。你知道吗 但对我来说太贵了，太慢了。你知道吗 你能给我建议一个使过程非常快的方法吗？你知道吗 谢谢你

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

为每个字符串列表添加特定字符在python中

1 个回答

相关Python问题