我正在为单词云准备文本,但我被卡住了。你知道吗
我需要去掉所有的数字,所有的符号。, - ? = / ! @等等,但我不知道怎么做。我不想一次又一次地替换。有办法吗?你知道吗
以下是我的想法和我要做的:
abstracts_list = open('new','r')
abstracts = []
allab = ''
for ab in abstracts_list:
abstracts.append(ab)
for ab in abstracts:
allab += ab
Lower = allab.lower()
文本示例:
MicroRNAs (miRNAs) are a class of noncoding RNA molecules approximately 19 to 25 nucleotides in length that downregulate the expression of target genes at the post-transcriptional level by binding to the 3'-untranslated region (3'-UTR). Epstein-Barr virus (EBV) generates at least 44 miRNAs, but the functions of most of these miRNAs have not yet been identified. Previously, we reported BRUCE as a target of miR-BART15-3p, a miRNA produced by EBV, but our data suggested that there might be other apoptosis-associated target genes of miR-BART15-3p. Thus, in this study, we searched for new target genes of miR-BART15-3p using in silico analyses. We found a possible seed match site in the 3'-UTR of Tax1-binding protein 1 (TAX1BP1). The luciferase activity of a reporter vector including the 3'-UTR of TAX1BP1 was decreased by miR-BART15-3p. MiR-BART15-3p downregulated the expression of TAX1BP1 mRNA and protein in AGS cells, while an inhibitor against miR-BART15-3p upregulated the expression of TAX1BP1 mRNA and protein in AGS-EBV cells. Mir-BART15-3p modulated NF-κB activity in gastric cancer cell lines. Moreover, miR-BART15-3p strongly promoted chemosensitivity to 5-fluorouracil (5-FU). Our results suggest that miR-BART15-3p targets the anti-apoptotic TAX1BP1 gene in cancer cells, causing increased apoptosis and chemosensitivity to 5-FU.
我可能会尝试使用字符串.isalpha():
因此,要将大写字符设置为小写字符,可以执行以下操作: 所以只需将文本存储到字符串变量,例如string,然后使用命令
现在您的字符串将不包含大写字母。你知道吗
要再次删除特殊字符,模块re可以帮助您使用子命令:
使用这些命令,您的字符串将没有特殊字符
为了确定词频,可以使用需要导入计数器的模块集合。你知道吗
然后使用以下命令确定单词出现的频率:
Counter(STRING.split()).most_common()
相关问题 更多 >
编程相关推荐