如何解析纯文本表?(多行)

2024-06-28 15:13:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我想解析一个易于直观阅读但缺少任何实际模式的表。我希望它以python字典的形式出现,但我最终会把它变成一个数据帧。实际上从左到右有6列:课程1单元,课程1代码,课程1标题,课程2代码,课程2标题,课程2单元。但这可能更难解析。感谢您的帮助。你知道吗

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <!-- $Id: bymaj_report.htm,v 1.9 2008/01/29 17:54:48 adt Exp $ --> <html> <head> <meta content="no-cache" http-equiv="Pragma"/> <meta content="-1" http-equiv="Expires"/> <meta content="NO-CACHE" http-equiv="CACHE-CONTROL"/> <title> ASSIST: By Major Report </title> </head> <body style="margin: 0; padding: 0; background: #FFFEEA; font-family: Arial, Helvetica, sans-serif; font-size: 12px;"> <p align="LEFT"> <pre> Articulation Agreement by Major Effective during the 16-17 Academic Year <b> ====Electrical Engineering &amp; Computer Sciences, Lower Division B.S.==== </b> <b> COLLEGE OF ENGINEERING JUNIOR TRANSFER ADMISSION REQUIREMENTS:</b> Admission to the UC Berkeley <b>College of Engineering </b>is highly competitive. Applicants to the <b>Electrical Engineering and Computer Science </b>major must complete all <u>required</u> core UCB preparation courses in order to be eligible for admission. Only applicants who have completed 100% of these <u>required</u> courses will be considered for admission. Required courses for admission to the major must be completed by the end of the spring semester prior to fall enrollment. <b>A summer 2017 course is not considered to be "work in progress" for the fall 2017 selection process. </b>If a series of courses at a community college is required (e.g., English 1A + 1B + 103 = English R1A and R1B), <u>all</u> the courses in the series <u>must</u> be completed, and <u>must</u> (unless otherwise indicated) be completed at the same community college. Partial completion (e.g., 2 of the 3 required courses) will result in zero credit toward the requirement(s), and the applicant will NOT be considered for admission. <b> </b>Lower division UC Berkeley courses required for graduation (but not admission) are also listed in the major agreements and are strongly recommended to be taken to strengthen one's application. The more of these courses completed, the stronger the application will be. Required core courses for admission: (all these courses must be completed to be considered for admission) - UCB Math 1A, 1B - UCB Math 53, 54 - UCB Physics 7A, 7B - UCB English R1A and R1B - One from UCB Astronomy 7A or 7B or Bio 1A &amp; 1AL or Bio 1B or Chem 1A/1AL or Chem 1B or Chem 3A/L or 3B/L or Mcellbi 32 &amp; 32L or Physics 7C Strongly recommended courses: (if your college offers the courses listed below and they are articulated, taking them will strengthen your application) Electrical Engineering 20 and 40 were taught for the final time at UCB in Fall 2015. Electrical Engineering 20 and Electrical Engineering 40 have been replaced with Electrical Engineering 16A and 16B. <b>The curriculum changes are effective for students admitted beginning Fall 15. </b> - UCB Compsci 61A - UCB Compsci 61B - UCB Compsci 61C - UCB El Eng 16A - UCB El Eng 16B - UCB Compsci 70 Admission is primarily based on the completeness of the applicant's lower division preparation and the level of academic achievement reflected in the student's grade point average. The UC applicant essay also plays an important role in the selection process at UC Berkeley. The College reviews the essay for evidence of interest in the student's chosen field and a thoughtful match between the academic program and the student's academic and career objectives. The College of Engineering requires six humanities/social science courses, two of which must be reading and composition. The only non-technical admission requirement for the College of Engineering is the coursework equivalent to UC Berkeley's English R1A and R1B (reading and composition), which must be taken for a letter grade. The College of Engineering <b>does not recognize the Intersegmental General Education Transfer Curriculum (IGETC) and strongly discourages</b> students from following this option due to the number of major-specific technical courses required for engineering transfer admission. <b>NOTE:</b> The English R1A and R1B requirements <u>cannot</u> be satisfied by IGETC; applicants <u>must</u> complete the specific courses indicated as English R1A and R1B equivalents to be considered for admission. Failure to complete the exact courses listed will mean the applicant will NOT be considered for admission. The remaining four humanities/social science requirement courses are not considered for admission purposes but are required for graduation. See <a href="http://coe.berkeley.edu/hssreq" target="_blank">http://engineering.berkeley.edu/hssreq</a> for the College of Engineering humanities/social science breadth requirements and courses. Courses which are three semester units or more that appear in the following categories on the "General Education/Breadth" section of <a href="http://assist.org" target="_blank">assist.org</a> may be used to satisfy <b>two of</b> the remaining four humanities/social science course requirements for the College of Engineering. ARTS AND LITERATURE; HISTORICAL STUDIES; INTERNATIONAL STUDIES; PHILOSOPHY AND VALUES; SOCIAL AND BEHAVIORAL SCIENCES. SAT/ACT/A-level test scores and letters of recommendation are NOT considered for admission. <b>NOTE: ALL REQUIRED COURSES AND ALL STRONGLY RECOMMENDED COURSES FOR THE MAJOR MUST BE TAKEN FOR A LETTER GRADE. FOR MORE INFORMATION, PLEASE CHECK THE COLLEGE'S WEB SITE FOR THE <u>COLLEGE OF ENGINEERING UNDERGRADUATE GUIDE.</u> For more information: </b><a href="http://engineering.berkeley.edu/admissions/undergraduate-admissions" target="_blank">http://engineering.berkeley.edu/admissions/undergraduate-admissions</a> <b> College of Engineering Undergraduate Guide:</b> <a href="http://engineering.berkeley.edu/academics/undergraduate-guide" target="_blank">http://engineering.berkeley.edu/academics/undergraduate-guide</a><b><a href="http://coe.berkeley.edu/guide " target="_blank"> </a> For more information on Electrical Engineering &amp; Computer Science:</b> <a href="http://www.eecs.berkeley.edu" target="_blank">http://www.eecs.berkeley.edu</a> <b>For more information on admission to UC Berkeley:</b> <a href="http://admissions.berkeley.edu" target="_blank">http://admissions.berkeley.edu</a> <b>For more information on majors at UC Berkeley:</b> <b>Berkeley Academic Guide: </b><a href="http://guide.berkeley.edu/" target="_blank">http://guide.berkeley.edu</a> -------------------------------------------------------------------------------- <b> AP TEST CREDIT</b> For students who have taken Advanced Placement Exams in high school, the College will clear requirements as follows: Biology AP: a score of 4 or 5 satisfies UCB Biology 1A/AL and 1B. Chemistry AP: a score of 3 or better satisfies UCB Chemistry 1A/1AL. English AP (Literature and Composition): a score of 4 or 5 satisfies UCB English R1A. English AP (Language and Composition): a score of 4 or 5 satisfies UCB English R1A. Mathematics AP (AB Exam): a score of 3 or better satisfies UCB Math 1A. Mathematics AP (BC Exam): a score of 3 satisfies UCB Math 1A. Mathematics AP (BC Exam): a score of 4 or 5 satisfies UCB Math 1A and 1B. Physics AP (Mechanics C Exam): a score of 5 satisfies UCB Physics 7A. -------------------------------------------------------------------------------- <b>Required Courses for Admission:</b> -------------------------------------------------------------------------------- MATH 1A Calculus (4)|MATH 150 Calculus and Analytic (5) | Geometry I -------------------------------------------------------------------------------- MATH 1B Calculus (4)|MATH 155 Calculus and Analytic (4) | Geometry II -------------------------------------------------------------------------------- MATH 53 Multivariable Calculus (4)|MATH 260 Calculus and Analytic (4) | Geometry III -------------------------------------------------------------------------------- MATH 54 Linear Algebra and (4)|MATH 265 <b><u>&amp;</u></b></pre></p></body></html> Differential Equations (4) Differential Equations |MATH 270 Linear Algebra (4) -------------------------------------------------------------------------------- PHYSICS 7A Physics for Scientists (4)|PHYS 151 Principles of Physics (4) and Engineers | I -------------------------------------------------------------------------------- PHYSICS 7B Physics for Scientists (4)|PHYS 152 Principles of Physics (4) and Engineers | II -------------------------------------------------------------------------------- ENGLISH R1A Reading and (4)|ENGL 100 Composition and (4) Composition | Reading -------------------------------------------------------------------------------- ENGLISH R1B Reading and (4)|ENGL 201 Critical Thinking, (4) Composition | Composition, and | Literature | <b><u>OR</u></b> | |ENGL 201H Critical Thinking, (4) | Composition, and | Literature (Honors) | <b><u>OR</u></b> |ENGL 202 Critical Thinking and (4) | Composition | <b><u>OR</u></b> |ENGL 202H Critical and Thinking (4) | and Composition | (Honors) -------------------------------------------------------------------------------- <b> Natural Science required for admission: </b>One course or course series required from the list below: -------------------------------------------------------------------------------- ASTRON 7A Introduction to (4)|NO COURSE ARTICULATED Astrophysics | -------------------------------------------------------------------------------- ASTRON 7B Introduction to (4)|NO COURSE ARTICULATED Astrophysics | -------------------------------------------------------------------------------- BIOLOGY 1A <b><u>&amp;</u></b> General Biology (3)|BIO 202 <b><u>&amp;</u></b> Foundations of Biology: (4) Lecture (Cells, | Evolution, Genetics, Animal Form | Biodiversity and &amp; Function) | Organismal Biology BIOLOGY 1AL <b><u>&amp;</u></b> General Biology (2)|BIO 204 Foundations of Biology: (4) Laboratory | Biochemistry, Cell BIOLOGY 1B General Biology (Plant (4)| Biology, Genetics and Form &amp; Function, | Molecular Biology Ecology, Evolution) | -------------------------------------------------------------------------------- CHEM 1A <b><u>&amp;</u></b> General Chemistry (3)|CHEM 110 <b><u>&amp;</u></b> General Chemistry (5) CHEM 1AL <b><u>&amp;</u></b> General Chemistry (1)|CHEM 111 General Chemistry (5) Laboratory | <b><u>OR</u></b> CHEM 1B General Chemistry (4)|CHEM 110H <b><u>&amp;</u></b> General Chemistry I (5) | (Honors) |CHEM 111H General Chemistry II (5) | (Honors) -------------------------------------------------------------------------------- CHEM 3A <b><u>&amp;</u></b> Chemical Structure and (3)|CHEM 210 Organic Chemistry I (5) Reactivity | <b><u>OR</u></b> CHEM 3AL Organic Chemistry (2)|CHEM 210H Organic Chemistry I (5) Laboratory | (Honors) -------------------------------------------------------------------------------- CHEM 3B <b><u>&amp;</u></b> Chemical Structure and (3)|CHEM 211 Organic Chemistry II (5) Reactivity | <b><u>OR</u></b> CHEM 3BL Organic Chemistry (2)|CHEM 211H Organic Chemistry II (5) Laboratory | (Honors) -------------------------------------------------------------------------------- MCELLBI 32 <b><u>&amp;</u></b> Introduction to Human (3)|BIO 220 Human Physiology (4) Physiology | MCELLBI 32L Introduction to Human (2)| Physiology Laboratory | -------------------------------------------------------------------------------- PHYSICS 7C Physics for Scientists (4)|PHYS 253 Principles of Physics (4) and Engineers | III -------------------------------------------------------------------------------- <b>Strongly Recommended Courses</b> (if your college offers courses listed below and they are articulated, taking them will strengthen your application): Electrical Engineering 20 and 40 were taught for the final time at UCB in Fall 2015. Electrical Engineering 20 and Electrical Engineering 40 have been replaced with Electrical Engineering 16A and 16B. <b>The curriculum changes are effective for students admitted beginning Fall 15</b>. If no articulation, students are strongly encouraged to take an introductory course in electronics or circuits AND courses in Java, C++ and Data Structures. -------------------------------------------------------------------------------- COMPSCI 61A The Structure and (4)|NO COURSE ARTICULATED Interpretation of | Computer Programs | -------------------------------------------------------------------------------- COMPSCI 61B Data Structures (4)|CS 112 <b><u>&amp;</u></b> Introduction to Computer (3) | Science II: Java |CS 113 Basic Data Structures (3) | and Algorithms |<b>NOTE:</b> Students must also complete UCB |COMPSCI 47B at Berkeley to satisfy |this requirement. -------------------------------------------------------------------------------- COMPSCI 61C Machine Structures (4)|NO COURSE ARTICULATED -------------------------------------------------------------------------------- EL ENG 16A Designing Information (4)|NO COURSE ARTICULATED Devices and Systems I | -------------------------------------------------------------------------------- EL ENG 16B Designing Information (4)|NO COURSE ARTICULATED Devices and Systems II | -------------------------------------------------------------------------------- COMPSCI 70 Discrete Mathematics (4)|NO COURSE ARTICULATED and Probability Theory | -------------------------------------------------------------------------------- <b>IMPORTANT INFORMATION ABOUT THIS MAJOR:</b> <b>- </b>The course/s cited have been officially accepted by this major and approved by both a Berkeley advisor/faculty member and Berkeley's articulation officer. - Consult ASSIST frequently to obtain current information as this articulation agreement is subject to periodic revision. -------------------------------------------------------------------------------- <b>END OF MAJOR</b> <br/>

我想把那张桌子放在最下面。必修课。我已经附加了原始的网址。你知道吗

http://web2.assist.org/web-assist/report.do?agreement=aa&reportPath=REPORT_2&reportScript=Rep2.pl&event=19&dir=2&sia=MIRACSTA&ria=UCB&ia=UCB&oia=MIRACSTA&aay=16-17&ay=16-17&dora=EECS


Tags: orandofthetohttpforbe
1条回答
网友
1楼 · 发布于 2024-06-28 15:13:48

可以使用正则表达式解析表。根据这份文件,我做了以下观察:

  1. 我能看到的是你有一个课程的分隔符:
  2. 在这个分隔符之间,有一个左手和一个右手,由管道|分隔。你知道吗
  3. 另外,当引用模块时,右括号内有一个数字(例如(1))。你知道吗

第1部分您可以轻松实现,第2部分和第3部分可以用正则表达式部分解决。
我在这里提供了解决方案中最难的部分,您可以轻松地实现其余部分,并且可能将其放入一个数据帧中。你知道吗

import re
# separate left hand and right hand 
strline = "course name |MATH 123 divide by 0 (1)"
left, right = re.split('\|',strline)  # split around the pipe
# find the parenthesis marker on the right
m = re.search(r'\(\d+\)',right)
has_module_name = m is not None

if has_module_name:
   # catch the module name, description and exclude number in parenthesis
   m_mod = re.match(r'([a-zA-Z0-9]+ [a-zA-Z0-9]+)([^\(\)]+).*',right)

   if m_mod is not None:
      mod_code = m_mod.group(1)
      mod_desc = m_mod.group(2)
      print("module name: {}".format(mod_code))
      print("module first line desc: {}".format(mod_desc))

# todo: make loops to catch other lines of desc, remove duplicate whitespaces

退货:

module name: MATH 123
module first line desc:  divide by 0

相关问题 更多 >