从字符串中删除字符/符号

2024-05-20 19:36:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在为单词云准备文本,但我被卡住了。你知道吗

我需要去掉所有的数字,所有的符号。, - ? = / ! @等等,但我不知道怎么做。我不想一次又一次地替换。有办法吗?你知道吗

以下是我的想法和我要做的:

  • 在一个字符串中串联文本
  • 将字符设置为小写<;---我在这里
  • 现在我想删除特定的符号并将文本分成单词(列表)
  • 计算词频
  • 接下来执行stopwords脚本。。。你知道吗
abstracts_list = open('new','r')
abstracts = []
allab = ''
for ab in abstracts_list:
    abstracts.append(ab)
for ab in abstracts:
    allab += ab
Lower = allab.lower()

文本示例:

MicroRNAs (miRNAs) are a class of noncoding RNA molecules approximately 19 to 25 nucleotides in length that downregulate the expression of target genes at the post-transcriptional level by binding to the 3'-untranslated region (3'-UTR). Epstein-Barr virus (EBV) generates at least 44 miRNAs, but the functions of most of these miRNAs have not yet been identified. Previously, we reported BRUCE as a target of miR-BART15-3p, a miRNA produced by EBV, but our data suggested that there might be other apoptosis-associated target genes of miR-BART15-3p. Thus, in this study, we searched for new target genes of miR-BART15-3p using in silico analyses. We found a possible seed match site in the 3'-UTR of Tax1-binding protein 1 (TAX1BP1). The luciferase activity of a reporter vector including the 3'-UTR of TAX1BP1 was decreased by miR-BART15-3p. MiR-BART15-3p downregulated the expression of TAX1BP1 mRNA and protein in AGS cells, while an inhibitor against miR-BART15-3p upregulated the expression of TAX1BP1 mRNA and protein in AGS-EBV cells. Mir-BART15-3p modulated NF-κB activity in gastric cancer cell lines. Moreover, miR-BART15-3p strongly promoted chemosensitivity to 5-fluorouracil (5-FU). Our results suggest that miR-BART15-3p targets the anti-apoptotic TAX1BP1 gene in cancer cells, causing increased apoptosis and chemosensitivity to 5-FU.


Tags: ofthetoin文本targetforthat
2条回答

我可能会尝试使用字符串.isalpha():

abstracts = []
with open('new','r') as abstracts_list:
    for ab in abstracts_list:  # this gives one line of text. 
        if not ab.isalpha():
            ab = ''.join(c for c in ab if c.isalpha() 
        abstracts.append(ab.lower())
# now assuming you want the text in one big string like allab was
long_string = ''.join(abstracts)

因此,要将大写字符设置为小写字符,可以执行以下操作: 所以只需将文本存储到字符串变量,例如string,然后使用命令

STRING=re.sub('([A-Z]{1})', r'\1',STRING).lower()

现在您的字符串将不包含大写字母。你知道吗

要再次删除特殊字符,模块re可以帮助您使用子命令:

STRING = re.sub('[^a-zA-Z0-9-_*.]', ' ', STRING )

使用这些命令,您的字符串将没有特殊字符

为了确定词频,可以使用需要导入计数器的模块集合。你知道吗

然后使用以下命令确定单词出现的频率:

Counter(STRING.split()).most_common()

相关问题 更多 >