解析具有相同cod的多个字符串

2024-10-04 03:21:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图打印所有网球运动员在下面的网址。但是split函数只打印一个玩家的名字,尽管其他玩家的名字可以从下面的split函数name1.split(“>;”)[1].split(“>;”)中检索到

import time
import urllib2
from urllib2 import urlopen
import datetime

def Tennis():
    try:
        australianOpen = urllib2.urlopen('http://www.ausopen.com/en_AU/players/profiles.html').read()

        names1 =australianOpen.split('</div><div id="section_A" class="sectionHeading"><div class="men">A</div><div class="women">A</div></div><div class="section"><div class="men">')[1].split('</a></div></div></div></div>')[0]

        for Eachnames in names1 :

            Eachnames = names1.split('">')[1].split('</a><a href="')[0]


            print Eachnames


    except Exception,e:
        print str(e)

Tennis()

Tags: 函数importgtdiv玩家sectionurllib2名字
2条回答

这个问题与行Eachnames = names1.split('">')[1].split('</a><a href="')[0]有关,您在这里所做的是将字符串拆分为子字符串数组,然后使用[1]选择1:st元素。由于在找到第一个名称后没有修改变量names1,因此将反复选择相同的名称。一个简单的修改是

import time
import urllib
import urllib.request
import datetime

def Tennis():
    try:
        australianOpen =     urllib.request.urlopen('http://www.ausopen.com/en_AU/players/profiles.html').read().decode('utf-8')

        names1 =australianOpen.split('</div><div id="section_A" class="sectionHeading"><div     class="men">A</div><div class="women">A</div></div><div class="section"><div class="men">')[1].split('</a></div></div></div></div>')[0]

    the_names = names1.split('">')
    for name in the_names:
       print (name.split('</a><a href="')) 

    except Exception:
        print ("Exception", sys.exc_info()[0])


Tennis()

但是打印输出会出错,因为您的搜索标准都是错误的(除非您想要jibberish,比如half URL adressen et.c.)。我认为一个好的简单的解决方案是使用正则表达式。 一个简单的regexp来捕获名称,不需要特殊字符

the_names = re.findall("\">([A-Za-z]*, [A-Za-z]*)", names1) 

一个使用regexp的稍微简化的程序是

import urllib.request
import re

def Tennis():
    try:
        australianOpen = urllib.request.urlopen('http://www.ausopen.com/en_AU/players/profiles.html').read().decode('utf-8')

        names1 =australianOpen.split('</div><div id="section_A" class="sectionHeading"><div class="men">A</div><div class="women">A</div></div><div class="section"><div class="men">')[1].split('</a></div></div></div></div>')[0]
    except Exception:
        print ("Exception")

    the_names = re.findall("\">([A-Za-z]*, [A-Za-z]*)", names1)
    for name in the_names:
        print (name) 

Tennis()

希望这有帮助

只需添加names1=names1[names1.find(Eachnames)+len(Eachnames):]

def Tennis():
    try:
        australianOpen = urllib2.urlopen('http://www.ausopen.com/en_AU/players/profiles.html').read()

        #print australianOpen
        names1 =australianOpen.split('</div><div id="section_A" class="sectionHeading"><div class="men">A</div><div class="women">A</div></div><div class="section"><div class="men">')[1].split('</a></div></div></div></div>')[0]

        for Eachnames in names1 :

            Eachnames = names1.split('">')[1].split('</a><a href="')[0]
            names1 = names1[names1.find(Eachnames)+len(Eachnames):]
            if Eachnames.find('<')!= -1:
             Eachnames=Eachnames[:Eachnames.find('<')]


            print Eachnames


    except Exception,e:
        print str(e)

Tennis()

相关问题 更多 >