检查Regexp是否正确

2024-09-28 17:30:19 发布

您现在位置:Python中文网/ 问答频道 /正文

情景:

  1. Multiple entries exists in student.txt file for different subject fields as Rollno, StudName, Subject, Marks for each student:

    <Rollno>,<StudName>,<Subject>,<Marks>
    
    101,Santosh,maths,35
    102,Hina,English,41
    101,Santosh,Hindi,30
    
  2. Merge the student.txt data as per the template file (tempstud.txt) below for each student. The total for each student should be displayed below at end of record.

    Template:

    {Id:<Rollno>, Name:<StudName>, subject:<Subject>, marks:<Marks>}
    

    Santosh.101.txt

    {Id:101, Name:Santosh, subject:maths, marks:35}
    {Id:101, Name:Santosh, subject:Hindi, marks:30}
    
    Total : 65
    
  3. Create different files for each student , filename of this output file should be <Name><rollno.txt

注意,假设我正在更改模板文件,如下所示

{ marks:<Marks>-Id#<Rollno>-Name:<StudName>-subject#<Subject>}

我还可以对数据文件进行如下更改(更改列字段)

<Subject>,<RollNo>,<StudName>,<Marks>

maths,101,Santosh,35
English,102,Hina,41
Hindi,101,Santosh,30

Your code should be generic enough for above changes also.

下面我写了半段代码,并且卡在了regexp中,匹配没有正常进行

请让我知道要更改什么,以及代码实现是否与场景匹配


import re
import collections, logging
f = 'student.txt'
r = open(f, 'r')
r = r.read()
r= r.splitlines()
d ={}

for i in r :
    print(i)
    st = re.search(r"(?P<Rollno>\d+)\S+(?P<StudName>\w\D+)\S+(?P<Subject>\w\D+\S+(?P<Marks>\W\d+)",i,re.I)
    print(st)
    if(st):
        data = st.groupdict()
        print(data)
        Rollno =data['Rollno']
        print('here 1st: ', Rollno)
        StudName = data['StudName']
        print('here 2nd: ', StudName)
        Subject = data['Subject']
        print('here 3rd: ', Subject)
        Marks = data['Marks']
        print('here 4th: ', Marks)
        d ={'Rollno':Rollno,'StudName':StudName,'Subject':Subject,'Marks':Marks}
        print(d)

输出:

<Rollno> <StudName> <Subject> <Marks>
None
101 Santosh maths 35
<re.Match object; span=(0, 20), match='101 Santosh maths 35'>
{'Rollno': '1', 'StudName': '1 Santosh m', 'Subject': 'th', 'Marks': ' 35'}
here 1st:  1
here 2nd:  1 Santosh m
here 3rd:  th
here 4th:   35
{'Rollno': '1', 'StudName': '1 Santosh m', 'Subject': 'th', 'Marks': ' 35'}
102 Hina English 41
<re.Match object; span=(0, 19), match='102 Hina English 41'>
{'Rollno': '1', 'StudName': '2 Hina Eng', 'Subject': 'is', 'Marks': ' 41'}
here 1st:  1
here 2nd:  2 Hina Eng
here 3rd:  is
here 4th:   41
{'Rollno': '1', 'StudName': '2 Hina Eng', 'Subject': 'is', 'Marks': ' 41'}}-----> properly not matching wih regexp
101 Santosh Hindi 30
<re.Match object; span=(0, 20), match='101 Santosh Hindi 30'>
{'Rollno': '1', 'StudName': '1 Santosh H', 'Subject': 'nd', 'Marks': ' 30'}}-----> properly not matching wih regexp
here 1st:  1
here 2nd:  1 Santosh H
here 3rd:  nd
here 4th:   30
{'Rollno': '1', 'StudName': '1 Santosh H', 'Subject': 'nd', 'Marks': ' 30'}-----> properly not matching wih regexp

Tags: retxtfordataherestudentsubjectprint
1条回答
网友
1楼 · 发布于 2024-09-28 17:30:19

您的regexp有一些问题

固定在这里:

https://regex101.com/r/mEWE9F/1

检查右侧面板,查看修改的详细说明

基本上,你犯了以下错误:

  1. \S+用于选择空格和CSV分隔符(逗号):您应该使用\s*,\s*,这意味着,逗号左侧的任何空格,然后是逗号,然后是逗号右侧的任何空格
  2. \w\D+用于选择单词。这将选择一个字符(\w),然后选择一个或多个非数字的内容\D+。只要\w+就可以更好地为您服务
  3. \W\d+用于选择分数(等级)。对于整数标记来说,只要\d+就足够了

再次检查上面链接右侧面板上的详细说明

你必须从中学到的是:

\d Matches any decimal digit; this is equivalent to the class [0-9].

\D Matches any non-digit character; this is equivalent to the class [^0-9].

\s Matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v].

\S Matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v].

\w Matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_].

\W Matches any non-alphanumeric character; this is equivalent to the class [^a-zA-Z0-9_].

因此,特殊序列中的大写字母总是选择与相应的小写字母特殊序列所选择的相反的字母(强调,因为这是您的主要困惑)

相关问题 更多 >