如何使用NLTK生成随机段落

2024-09-28 22:27:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试构建一个测试单元来对发布管理的一个非常大的实现进行压力测试。我考虑过使用NLTK生成段落,关于不同的事情和文章的随机标题。

NLTK能做这样的事吗?我想试着让每一篇文章都独一无二,以测试不同的版面尺寸。我也希望对主题做同样的事情。

p.S Am需要生成100多万篇文章,最终将用于测试许多东西(性能、搜索、布局等)

谁能给点建议吗?


Tags: 标题主题尺寸文章布局性能am事情
1条回答
网友
1楼 · 发布于 2024-09-28 22:27:54

我用过这个。它从诺姆乔姆斯基的短语和生成随机段落。你可以把原料文本改成你想要的。当然,你用的文字越多越好。在

# List of LEADINs to buy time.
leadins = """To characterize a linguistic level L,
        On the other hand,
        This suggests that
        It appears that
        Furthermore """

# List of SUBJECTs chosen for maximum professorial macho.
subjects = """ the notion of level of grammaticalness
        a case of semigrammaticalness of a different sort
        most of the methodological work in modern linguistics
        a subset of English sentences interesting on quite independent grounds
        the natural general principle that will subsume this case """

#List of VERBs chosen for autorecursive obfuscation.
verbs = """can be defined in such a way as to impose
        delimits
        suffices to account for
        cannot be arbitrary in
        is not subject to """


# List of OBJECTs selected for profound sententiousness.

objects = """ problems of phonemic and morphological analysis.
        a corpus of utterance tokens upon which conformity has been defined by the paired utterance test.
        the traditional practice of grammarians.
        the levels of acceptability from fairly high (e.g. (99a)) to virtual gibberish (e.g. (98d)).
        a stipulation to place the constructions into these various categories.
        a descriptive fact.
        a parasitic gap construction."""

import textwrap, random
from itertools import chain, islice, izip
from time import sleep

def chomsky(times=1, line_length=72):
    parts = []
    for part in (leadins, subjects, verbs, objects):
        phraselist = map(str.strip, part.splitlines())
        random.shuffle(phraselist)
        parts.append(phraselist)
    output = chain(*islice(izip(*parts), 0, times))
    return textwrap.fill(' '.join(output), line_length)

print chomsky()

我又回来了:

This suggests that a case of semigrammaticalness of a different sort is not subject to a corpus of utterance tokens upon which conformity has been defined by the paired utterance test.

为了一个头衔,你当然可以

^{pr2}$

相关问题 更多 >