如何从pandas数据框中删除带有正则表达式的答案中的点号：“（I）”、“（ii）”、“（iii）”？

2条回答

网友

1楼 · 编辑于 2024-09-30 05:31:40

如果一个i字符只能有1次或多次（因此没有罗马数字），您可以使用：

\(?i+\)|\b(?:[A-Za-z]|\d+)\.

模式匹配：

\(?i+\)匹配可选的(，然后1+乘以i字符和)
|或
\b防止部分匹配的单词边界
(?:非捕获组
- [A-Za-z]匹配单个字符a-Za-z
- |或
- \d+匹配1+个数字
)关闭非捕获组
\.匹配一个点

Regex demo

如果你想匹配罗马数字，你可以看到this post

网友

2楼 · 编辑于 2024-09-30 05:31:40

请你试试：

df.replace(regex=True, inplace=True, to_replace=r'^\(?(?:[ivxlcdm]+|[a-zA-Z]+|[0-9]+)[).]', value='')

输入：

(i) The cow has four legs.
(ii) The cow eats grass.
(iii) Cow gives us milk.
a.The cow has four legs.
b.The cow eats grass.
c.Cow gives us milk.
1.The cow has four legs.
2.The cow eats grass.
3.Cow gives us milk.
a)The cow has four legs.
b)The cow eats grass.
c)Cow gives us milk.

输出：

The cow eats grass.
Cow gives us milk.
The cow has four legs.
The cow eats grass.
Cow gives us milk.
The cow has four legs.
The cow eats grass.
Cow gives us milk.
The cow has four legs.
The cow eats grass.
Cow gives us milk.

regex ^\(?(?:[ivxlcdm]+|[a-zA-Z]+|[0-9]+)[).]的解释：

^表示字符串的开头
\(?匹配零或一个左括号
(?:[ivxlcdm]+|[a-zA-Z]+|[0-9]+)可以分解为以下任一种：
- [ivxlcdm]+匹配罗马数字的
- [a-zA-Z]+匹配字母表的
- [0-9]+匹配数字
[).]匹配右括号或点

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何从pandas数据框中删除带有正则表达式的答案中的点号：“（I）”、“（ii）”、“（iii）”？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >