在python3中如何从文本文件中获取段落的起始偏移量和结束偏移量

paraStartOffset = [] paraEndOffset = [] for match in re.finditer(r'(?s)((?:[^\n]?)+)', textFile): paraStartOffset.append(match.start()) paraEndOffset.append(match.end()) print "start Offset --> ",paraStartOffset print "end Offset --> ",paraEndOffset

1条回答

网友

1楼 · 发布于 2024-10-01 00:23:31

我想这篇question / answer基本上讨论了你在找什么。如果我在段落开头也使用前导空格测试代码（取自答案），那么它几乎可以工作。在

for match in re.finditer(r'(?s)((?:[^\n][\n]?)+)', DATA):
    print match.start(), match.end()

当我在我的测试文本（取自Bram Stoker's Dracula）上运行它时，它返回以下结果：第一段是上的标准。第二个从空格开始。第三个以TAB开头。在

结果：（显示每个段落的起始偏移量和结束偏移量）

^{pr2}$

测试文本：（我无法获得与原始格式完全相同的格式，但无论如何…）

_3 May. Bistritz._ Left Munich at 8:35 P. M., on 1st May, arriving at
Vienna early next morning; should have arrived at 6:46, but train was an
hour late. Buda-Pesth seems a wonderful place, from the glimpse which I
got of it from the train and the little I could walk through the
streets. I feared to go very far from the station, as we had arrived
late and would start as near the correct time as possible. The
impression I had was that we were leaving the West and entering the
East; the most western of splendid bridges over the Danube, which is
here of noble width and depth, took us among the traditions of Turkish
rule.

  "My Friend. Welcome to the Carpathians. I am anxiously expecting
you. Sleep well to-night. At three to-morrow the diligence will
start for Bukovina; a place on it is kept for you. At the Borgo
Pass my carriage will await you and will bring you to me. I trust
that your journey from London has been a happy one, and that you
will enjoy your stay in my beautiful land.

    Just before I was leaving, the old lady came up to my room and said in a
very hysterical way:

相关问题更多 >

编程相关推荐

热门问题

热门文章