把散文分解、转换和重组成变异的形式。

prosedecomposer的Python项目详细描述


https://coveralls.io/repos/github/coreybobco/prosedecomposer-py/badge.svg?branch=masterhttps://badge.fury.io/py/prosedecomposer.svg

这是什么?在

写作的方法有很多种。在Unoriginal Genius中,马乔里·佩洛夫将“原始天才”的概念与“非原创天才”的反传统(包括抄袭模仿(也称为détournemant)和拼凑文字的行为进行了对比。T、 据肯尼斯·戈德史密斯(Kenneth Goldsmith)介绍,S.Eliot、James Joyce和Thomas Pynchon都是这种风格的典范,他们用百科全书、杂志、剪报和世界文学《开放的面孔》撰写了他们的开创性作品。在

今天,有无数的方法可以用软件来转换文本:马尔可夫链、切分、用单词代替相关单词、在书之间交换动词、GPT-2、BERT等等。今天的控制论作者可以利用这些作为分解剂,破坏原始文本,创造出可以进一步编辑、扩展的混乱的新语言,或者被合成一个原创的,有意义的作品。在

但这有什么用呢?在

本项目详细阐述了这些想法,允许用户:

  • projectgutenberg和^{str1}的公开文献中随机抽取句子和段落$存档.org或者你给它的任何文本。在

  • 在两个文本之间交换具有相同词性的单词-例如,将一个文本的所有形容词替换为另一个文本的形容词,将一个文本的所有形容词与另一个文本的名词进行交换,保留叙事或话语结构的结构,同时大幅度地改变内容。以查尔斯·狄更斯的《远大前程》中的这段话为例,当你用H.P.Lovecraft的故事《被回避的房子》中的一段话来代替名词和形容词,它就变成了超现实主义的恐怖:

    “It was then I began to understand that chimney in the eye had stopped, like the enveloping and the head, a human fungus ago. I noticed that Miss Havisham put down the height exactly on the time from which she had taken it up. As Estella dealt the streams, I glanced at the corpse-abhorrent again, and saw that the outline upon it, once few, now diseased, had never been worn. I glanced down at the sight from which the outline was insectoid, and saw that the half stocking on it, once few, now diseased, had been trodden ragged. Without this cosmos of thing, this standing still of all the worse monstrous attentions, not even the withered phosphorescent mist on the collapsed dissolving could have looked so like horror-mockings, or the human hideousness so like a horror.”

  • 通过马尔可夫链运行单个文本或文本列表,根据n-gram大小(默认值为1,最混乱)以或多或少混乱的方式半智能地重新组合单词。在

    Markov chain based generative algorithms like this one can create prose whose repetitions and permutations lend it a strange rhythm and which appears syntactically and semantically valid at first but eventually turns into nonsense. The Markov chain’s formulaic yet sassy and subversive sstyle is quite similar Gertrude Stein’s in The Making Of Americans, which she explains in details in the essay Composition as Explanation.

  • 对William S.Burroughs和Brion Gysin首创的cut-up method进行虚拟模拟,方法是将文本分解为随机长度的组成部分(其中单词的最小和最大长度被保留),然后随机地重新排列它们。在

安装

使用pip

python3 -m pip generativepoetry

但要使Gutenberg采样正常工作,必须填充Berkeley db缓存:

^{pr2}$

如果Gutenberg缓存在填充后出错,请删除缓存目录并重新填充。在

使用Docker

docker pull coreybobco/ProsedComposer公司 docker运行ProsedComposer docker exec-it ProsedComposer python3

如何使用

首先,导入库:

from prosedecomposer import *

从古腾堡项目中提取并清除文本或存档.org公司名称:

# From an Archive.org URL:
calvino_text = get_internet_archive_document('https://archive.org/stream/CalvinoItaloCosmicomics/Calvino-Italo-Cosmicomics_djvu.txt')
# From a Project Gutenberg URL:
alice_in_wonderland = get_gutenberg_document('https://www.gutenberg.org/ebooks/11')
# Select a random document from Project Gutenberg
random_gutenberg_text = random_gutenberg_document

ParsedText类提供了一些函数,用于随机抽样一个或多个具有一定长度的句子或段落:

parsed_calvino = ParsedText(calvino_text)
parsed_calvino.random_sentence()   # Returns a random sentence
parsed_calvino.random_sentence(minimum_tokens=25)  # Returns a random sentence of a guaranteed length in tokens
parsed_calvino.random_sentences()  # Returns 5 random sentences
parsed_calvino.random_sentences(num=7, minimum_tokens=25)  # Returns 7 random sentences of a guaranteed length
parsed_calvino.random_paragraph()  # Returns a random paragraph (of at least 3 sentence by default)
parsed_calvino.random_paragraph(minimum_sentences=5)  # Returns a paragraph with at least 5 sentences

要在文本之间交换具有相同词性的单词:

# Swap out adjectives and nouns between two random paragraphs of two random Gutenberg documents
doc1 = ParsedText(random_gutenberg_document())
doc2 = ParsedText(random_gutenberg_document())
swap_parts_of_speech(doc1.random_paragraph(), doc2.random_paragraph())
# Any of Spacy's part of speech tag values should work, though: https://spacy.io/api/annotation#pos-tagging
swap_parts_of_speech(doc1.random_paragraph(), doc2.random_paragraph(), parts_of_speech=["VERB", "CONJ"])
# Since NLG has not yet been implemented, expect syntax errors like subject-verb agreement.

要通过Markov链文本处理算法运行文本,请参阅下面的内容。您可能需要更大的n克大小(2或3) 如果您正在处理大量文本,即一次处理一本或几本书/故事等。在

output = markov(text)  # Just one text (defaults to n-gram size of 1 and 5 output sentences)
output = markov(text, ngram_size=3, num_output_sentence=7)  # Bigger n-gram size, more output sentences
output = markov([text1, text2, text3])  # List of text (defaults to n-gram size of 1 and 5 output sentences)
output = markov([text1, text2, text3], ngram_size=3, num_output_sentences=7)  # Bigger n-gram size, more outputs

要实质上剪切并重新排列文本:

# Cuts up a text into cutouts between 3 and 7 words and rearrange them randomly (returns a list of cutout strings)
cutouts = cutup(text)
# Cuts up a text into cutouts between 2 an 10 words and rearrange them randomly (returns a list of cutout strings)
cutouts = cutup(text, min_cutout_words=3, max_cutout_words=7)

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
构造函数的java条件调用   类Dog中的java构造函数Dog不能应用于给定类型   java jsch和运行“sudo su”   java将队列和堆栈相互复制   java如何在netbeans项目的文件夹中添加库   java While循环在我的代码中不存在   如何在XML中使用java方法的返回值   java是否可以在不写入文件的情况下将字符串/字节数组作为文件发布?   java为什么这些字符串不相等?   sockets客户机-服务器java编程,用户可选择   java如何在SpringMVC和hibernate中保存模型返回视图的列表   java如何修复组织。openqa。硒。WebDriverException:未知错误   Java,Ant错误:编码Cp1252的不可映射字符   JAVAlang.ClassCastException:[Ljava.lang.String;与java.lang.String不兼容   java如何使用JDK8(可选)为空字段创建自定义IntelliJ getter模板   java Tomcat6响应。sendRedirect()404错误