在Python中使用正则表达式搜索从PDF转换的课程

3条回答

网友

1楼 · 编辑于 2024-10-01 13:26:42

只有在括号没有嵌套的情况下才有可能：

[A-Z]{4} \d{3}(?:(?=([^.()]+))\1|\([^)]*\))+\.

网友

2楼 · 编辑于 2024-10-01 13:26:42

从四个字母的部门到“先决条件”后的第一个阶段，你都在找，对吧？所以说清楚点。你知道吗

>>IN:
txt = """
ACCT 221 Principles of Accounting II (3) Prerequisite: ACCT 220.
ASTD 485 Issues in East Asian Studies (3) (Intended as a final capstone course to be
taken in a student's last 15 credits.) Prerequisites: ASTD 284 (or ASTD 150) and 285
(or ASTD 160).
ASTR 100 Introduction to Astronomy (3) (Not open to students who have taken or are
taking any astronomy course numbered 250 or higher. For students not majoring or
minoring in a science.) Prerequisite: MATH 012 or higher."""

pat = re.compile([A-Z]{4}.*?Prerequisites?.*?\.)
courses = pat.findall(txt)
for course in courses:
    print(course+"\n")

>>OUT:
ACCT 221 Principles of Accounting II (3) Prerequisite: ACCT 220.

ASTD 485 Issues in East Asian Studies (3) (Intended as a final capstone course to be
taken in a student's last 15 credits.) Prerequisites: ASTD 284 (or ASTD 150) and 285
(or ASTD 160).

ASTR 100 Introduction to Astronomy (3) (Not open to students who have taken or are
taking any astronomy course numbered 250 or higher. For students not majoring or
minoring in a science.) Prerequisite: MATH 012 or higher.

网友

3楼 · 编辑于 2024-10-01 13:26:42

对于更简单的正则表达式，使用两个正则表达式没有什么错：

import re

text = '''\
ACCT 221 Principles of Accounting II (3) Prerequisite: ACCT 220
ASTD 485 Issues in East Asian Studies (3) (Intended as a final capstone course to be taken in a student's last 15 credits.) Prerequisites: ASTD 284 (or ASTD 150) and 285 (or ASTD 160).
ASTR 100 Introduction to Astronomy (3) (Not open to students who have taken or are taking any astronomy course numbered 250 or higher. For students not majoring or minoring in a science.) Prerequisite: MATH 012 or higher.
ASTD 380 American Relations with China and Japan: 1740 to Present (3) (Fulfills the general education requirement in the social sciences.) A study of American political, economic, and cultural relations with China and Japan from the American colonial era to modern times'''

courses={}
for line in text.splitlines():
    course=re.match(r'([A-Z]{4}\s+\d{3})', line).group(1)
    m=re.search(r'Prerequisites?:\s*(.*)', line)
    if m:
        pre=m.group(1)
    else:
        pre='None'    
    courses[course]=pre

print 'COURSE\t\tPREREQUISITE'    

for course in sorted(courses.keys()):
    print '{}\t{}'.format(course, courses[course])

印刷品：

COURSE      PREREQUISITE
ACCT 221    ACCT 220
ASTD 380    None
ASTD 485    ASTD 284 (or ASTD 150) and 285 (or ASTD 160).
ASTR 100    MATH 012 or higher.

相关问题更多 >

编程相关推荐

热门问题

热门文章

在Python中使用正则表达式搜索从PDF转换的课程

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >