从字典中的元组连接三叉树

2024-10-03 02:40:50 发布

您现在位置:Python中文网/ 问答频道 /正文

好的,我正在为我的语言学学士学位的一门课程做一个作业,我们正在用python来处理文本。这就是我需要做的:

Create a script that counts trigrams frequencies

  • Do not add dummy tokens
  • Lowercase every token and concatenate trigram units with an underscore
  • What are the missing values in the output box?
  • Bonus: Try to solve the task by storing trigrams a tuples in the dictionary

这就是我解决问题的方法,如果:

lyrics = "Do you remember 21st night of September ? Love was changing the mind of pretenders While chasing the clouds away Our hearts were ringing In the key that our souls were singing As we danced in the night Remember how the stars stole the night away yeah yeah yeah Hey hey hey Ba de ya say do you remember ? Ba de ya dancing in September Ba de ya never was a cloudy day Ba duda ba duda ba duda badu Ba duda badu ba duda badu Ba duda badu ba duda yeah My thoughts are with you Holding hands with your heart to see you Only blue talk and love Remember how we knew love was here to stay Now December Found the love we shared in September Only blue talk and love Remember the true love we share today Hey hey hey Ba de ya say do you remember ? Ba de ya dancing in September Ba de ya never was a cloudy day There was a Ba de ya say do you remember ? Ba de ya dancing in September Ba de ya golden dreams were shiny days Now our bell was ringing aha Our souls was singing Do you remember every cloudy day yau There was a Ba de ya say do you remember ? Ba de ya dancing in September Ba de ya never was a cloudy day There was a Ba de ya say do you remember ? Ba de ya dancing in September Ba de ya golden dreams were shiny days Ba de ya de ya de ya Ba de ya de ya de ya Ba de ya de ya de ya de ya Ba de ya de ya de ya Ba de ya de ya de ya Ba de ya de ya de ya de ya"

lyric = lyrics.lower()
listText = lyric.split(" ")
freq = {}



while len(listText) > 2:
    trigram = (listText[0], listText[1], listText[2])
    if trigram in freq.keys():
        freq[trigram] += 1
    else:
        freq[trigram] = 1
    listText.pop(0)

sorted_data = sorted(freq.items() , key=lambda x: x[1], reverse = True) 

for entry in sorted_data:
    print(str(entry[0])+"\t"+str(entry[1]))

我唯一缺少的部分是用下划线连接三元单位。这应该很简单,但我一辈子都找不到实现它的方法。输出应该是串联的三角形,后跟所述三角形的频率。老师说这个问题很容易解决,但我想不出来。这很有趣,因为我在这里做的每件事都非常快速和简单(相对而言)。你知道吗

我试过很多东西,但由于某种原因,我不能使它起作用。你知道吗


Tags: theinyoudedotrigramsayremember
2条回答

如果只是联系他们的问题,你可以使用str.join

trigram = (listText[0], listText[1], listText[2])
c_trigram = '_'.join(*trigram)

您可以看到一个无耻的自插拔示例here

您可以使用string的join方法。打印时只需调用'_'.join三叉树元组。你知道吗

print(str('_'.join(entry[0]))+"\t"+str(entry[1]))

其他注意事项:

(1)你可以更具python风格,使用如下列表理解生成你的listTextlistText = [word.lower() for word in lyrics.split()]

(2)您可以使用字典的setdefault而不是if/else来递增/初始化三角形,如下:freq.setdefault(trigram, 0),然后递增freq[trigram] += 1,而不使用任何if/else块。现在,您正在迭代freq.keys(),在if语句中搜索trigram,在python3中它是时间常数(相当于说trigram in freq),但在python2中是时间线性的。你知道吗

相关问题 更多 >