在tex中查找短语之间的距离

2024-05-18 10:18:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个问题,如何计算文字中短语之间的字数?例如,我有下一段文字:

Elon Musk is a technology entrepreneur and investor. He is the founder, CEO, and lead designer of SpaceX. Elon Musk has stated that the goals of SpaceX, Tesla, and SolarCity revolve around his vision to change the world and humanity.

我想数一数“Elon Mask”和“SpaceX”之间有多少个词。返回smth,就像一个带数字的列表,然后找到平均单词距离。例如,[15,6]。你知道吗

我知道,在单词的情况下,我们可以在单词上拆分文本。但是如何处理短语呢?你知道吗


Tags: andoftheisinvestor单词heelon
3条回答

有一些您尚未指定的逻辑,但类似以下内容可能会起到作用:

def find_distance(sentence, word1, word2):
    distances = []
    while sentence != "":
        _, _, sentence = sentence.partition(word1)
        text, _, _ = sentence.partition(word2)
        if text != "":
            distances.append(len(text.split()))
    return distances

如果你用你的句子来调用它,你会得到你想要的结果[15, 6]

print(find_distance(phrase, "Elon Musk", "SpaceX"))

注意,像Elon Musk is a technology Elon Musk entrepreneur ...这样的情况的行为必须定义。你想采取哪种情况?第一个还是第二个?你知道吗

正如用户Dominique提到的,有很多小细节你必须说明。我做了一个简单的程序,可以计算两个单词之间的距离。你想找出“埃隆·马斯克”和“太空X”之间的距离。为什么不找出“麝香”和“太空X”之间的距离呢?你知道吗

注意:此示例将返回单词第一次出现之间的距离。在这个程序中,我们找到了“Musk”(第2个单词)和“SpaceX”(第18个单词)之间的距离。距离之间是15个单词

Elon Musk is a technology entrepreneur and investor. He is the founder, CEO, and lead designer of SpaceX. Elon Musk has stated that the goals of SpaceX, Tesla, and SolarCity revolve around his vision to change the world and humanity.

示例(Python 3):

# Initial sentence
phrase = 'Elon Musk is a technology entrepreneur and investor. He is the founder, CEO, and lead designer of SpaceX. Elon Musk has stated that the goals of SpaceX, Tesla, and SolarCity revolve around his vision to change the world and humanity.'

# Removes common punctuation characters
phrase = ''.join(character for character in phrase if character not in ('!', '.' , ':' , ',', '"')) # Insert punctuation you want removed

# Creates a list of split words
word_list = phrase.split()

# Words you want to find the distance between (word_1 comes first in the sentence, then word_2)
word_1 = 'Musk'
word_2 = 'SpaceX'

# Calculates the distance between word_1 and word_2
distance = (word_list.index(word_2)) - (word_list.index(word_1))

# Prints distance between word_1 and word_2
print('Distance between "' + word_1 + '" and "' + word_2 + '" is ' + str(distance - 1) + ' words.')

输出:

“Musk”和“SpaceX”之间的距离是15个单词。

您可以根据点、感叹号和问号拆分文本,但是您的程序如何知道短语和表示缩写的点之间的区别呢?除此之外,你将如何处理括号?它们是否会被视为单独的短语?你知道吗

我不认为你的问题有一个直截了当的答案,除非你强迫你的措辞有一些严重的限制。你知道吗

相关问题 更多 >