为什么这个正则表达式贪婪，为什么示例代码永远重复？

#! python 3 #! Phone number and email address scraper #take user input for: #1. webpage to scrape # - user will be prompted to copy a link #2. file & location to save to #3. back to 1 or exit import pyperclip, re, os.path #function for locating phone numbers def phoneNums(clipboard): phoneNums = re.compile(r'^(?:\d{8}(?:\d{2}(?:\d{2})?)?|$\+?\d{2,3}$\s?(?:\d{4}[\s*.-]?\d{4}|\d{3}[\s*.-]?\d{3}|\d{2}([\s*.-]?)\d{2}\1\d{2}(?:\1\d{2})?))$') #(\+\d{1,4})? #Optional country code (optional: +, 1-4 digits) #(\s)? #Optional space #($\d$)? #Optional bracketed area code #(\d\d(\s)?\d | \d{3}) #3 digits with optional space between #(\s)? #Optional space #(\d{3}) #3 digits #(\s)? #Optional space #(\d{4}) #Last four #) #)', re.VERBOSE) #nos = phoneNums.search(clipboard) #ignore for now. Failed test of .group() return phoneNums.findall(clipboard) #function for locating email addresses def emails(clipboard): emails = re.compile(r'''( [a-z0-9._%+-]* #username @ #@ sign [a-z0-9.-]+ #domain name )''', re.I | re.VERBOSE) return emails.findall(clipboard) #function for copying email addresses and numbers from webpage to a file def scrape(fileName, saveLoc): newFile = os.path.join(saveLoc, fileName + ".txt") #file = open(newFile, "w+") #add phoneNums(currentText) + print(currentText) print(emails(currentText)) print(phoneNums(currentText)) #file.write(emails(currentText)) #file.close() url = '' currentText = '' file = '' location = '' while True: print("Please paste text to scrape. Press ENTER to exit.") currentText = str(pyperclip.waitForNewPaste()) #print("Filename?") #file = str(input()) #print("Where shall I save this? Defaults to C:") #location = str(input()) scrape(file, location)

def phoneNums(clipboard): phoneNums = re.compile(r'''( (\+\d{1,4})? #Optional country code (optional: +, 1-4 digits) (\s)? #Optional space ($\d$)? #Optional bracketed area code (\d\d(\s)?\d | \d{3}) #3 digits with optional space between (\s)? #Optional space (\d{3}) #3 digits (\s)? #Optional space (\d{4}) #Last four )+?''', re.VERBOSE)

1条回答

网友

1楼 · 发布于 2024-10-03 19:23:28

正如你所指出的，正则表达式是有效的

输入部分“+30 210 458 6600”匹配一次，结果是所有捕获子组的元组：（“+30 210 458 6600”、“+30”、“210”、“458”、“6600”）

请注意，元组中的第一个元素是整个匹配项

如果通过在左括号后插入?:使所有组成为non-capturing，则将不会剩下任何捕获组，结果将只有作为str的完整匹配“+30210468600”

    phoneNums = re.compile(r'''
        (?:\+\d{1,4})?                   #Optional country code (optional: +, 1-4 digits)
        (?:\s)?                          #Optional space
        (?:\(\d\))?                      #Optional bracketed area code
        (?:\d\d(?:\s)?\d | \d{3})        #3 digits with optional space between
        (?:\s)?                          #Optional space
        (?:\d{3})                        #3 digits
        (?:\s)?                          #Optional space
        (?:\d{4})                        #Last four
        ''', re.VERBOSE)

代码“永远重复”，因为while True:块是infinite loop。如果你想在一次迭代后停止，你可以在块的末尾放一个break语句来停止循环

while True:
    currentText = str(pyperclip.waitForNewPaste())
    scrape(file, location)
    break

相关问题更多 >

编程相关推荐

热门问题

热门文章