Python regex从orgmode文件获取项

** Hardware [0/1] - [ ] adapt a programmable motor to a tripod to be used for panning ** Reading - Technology [1/6] - [X] Introduction to Networking - Charles Severance - [ ] A Tour of C++ - Bjarne Stroustrup - [ ] C++ How to Program - Paul Deitel - [X] Computer Systems - Randal Bryant - [ ] The C programming language - Brian Kernighan - [ ] Beginning Linux Programming -Matthew and Stones ** Reading - Health [3/4] - [ ] Patrick McKeown - The Oxygen Advantage - [X] Total Knee Health - Martin Koban - [X] Supple Leopard - Kelly Starrett - [X] Convict Conditioning 1 and 2

- [X] Introduction to Networking - Charles Severance - [ ] A Tour of C++ - Bjarne Stroustrup - [ ] C++ How to Program - Paul Deitel - [X] Computer Systems - Randal Bryant - [ ] The C programming language - Brian Kernighan - [ ] Beginning Linux Programming -Matthew and Stones ** Reading - Health [3/4] - [ ] Patrick McKeown - The Oxygen Advantage - [X] Total Knee Health - Martin Koban - [X] Supple Leopard - Kelly Starrett - [X] Convict Conditioning 1 and 2

3条回答

网友

1楼 · 编辑于 2024-09-28 21:00:31

如果确定项目中不存在字符*，则可以使用：

re.compile(r"\*\* "+head+r" \[\d+/\d+\]\n([^*]+)\*?")

网友

2楼 · 编辑于 2024-09-28 21:00:31

你可以通过

import re

string = """
** Hardware [0/1]
 - [ ] adapt a programmable motor to a tripod to be used for panning 
** Reading - Technology [1/6]
 - [X] Introduction to Networking - Charles Severance
 - [ ] A Tour of C++ - Bjarne Stroustrup
 - [ ] C++ How to Program - Paul Deitel
 - [X] Computer Systems - Randal Bryant
 - [ ] The C programming language - Brian Kernighan
 - [ ] Beginning Linux Programming -Matthew and Stones
** Reading - Health [3/4]
 - [ ] Patrick McKeown - The Oxygen Advantage
 - [X] Total Knee Health - Martin Koban
 - [X] Supple Leopard - Kelly Starrett
 - [X] Convict Conditioning 1 and 2  
 """

def getitems(section):
    rx = re.compile(r'^\*{2} ' + re.escape(section) + r'.+[\n\r](?P<block>(?:(?!^\*{2})[\s\S])+)', re.MULTILINE)
    try:
        items = rx.search(string)
        return items.group('block')
    except:
        return None

items = getitems('Reading - Technology')
print(items)

看看working on ideone.com。

代码的核心是（浓缩的）表达式：

^\*{2}.+[\n\r]       # match the beginning of the line, followed by two stars, anything else in between and a newline
(?P<block>           # open group "block"
    (?:              # non-capturing group
        (?!^\*{2})   # a neg. lookahead, making sure no ** follows at the beginning of a line
        [\s\S]       # any character...
    )+               # ...at least once
)                    # close group "block"

在实际代码的**之后插入搜索字符串。请参见regex101.com上的Reading - Technology演示。

作为后续操作，您也可以只返回所选值，如下所示：
def getitems(section, selected=None): rx = re.compile(r'^\*{2} ' + re.escape(section) + r'.+[\n\r](?P<block>(?:(?!^\*{2})[\s\S])+)', re.MULTILINE) try: items = rx.search(string).group('block') if selected: rxi = re.compile(r'^ - \[X\]\ (.+)', re.MULTILINE) try: selected_items = rxi.findall(items) return selected_items except: return None return items except: return None items = getitems('Reading - Health', selected=True) print(items)

网友
3楼 · 编辑于 2024-09-28 21:00:31

不确定整场比赛都需要正则表达式。我只需要使用正则表达式来匹配**行，然后返回行，直到看到下一行**。你知道吗

像这样的

pattern = re.compile("\*\* "+ head)

start = False
output = []
for line in my_file:
    if pattern.match(line):
         start = True
         continue
    elif line.startswith("**"): # but doesn't match pattern
        break

    if start:
        output.append(line)

# now `output` should have the lines you want

相关问题更多 >

编程相关推荐

热门问题

热门文章