from docx import Document
doc = Document(r'path\to\file\pride_and_prejudice.docx')
all_text=[]
all_text_str=''
for para in doc.paragraphs:
all_text.append(para.text)
all_text_str=all_text_str.join(all_text)
clean_text=all_text_str.replace('\n', '') # Remove linebreaks
clean_text=clean_text.replace(' ', '') # Remove even number of spaces (e.g. This usually eliminates non-spaces nicely, but you can tweak accordingly.
document = Document()
p = document.add_paragraph(clean_text)
document.save(r'path\to\file\pride_and_prejudice_clean.docx')
我使用了默认情况下未安装的docx库,您可以使用pip或conda:
安装后:
相关问题 更多 >
编程相关推荐