求职美容团

from bs4 import BeautifulSoup import urllib2 url = 'https://www.smartrecruiters.com/SpectraForce1/' content = urllib2.urlopen(url).read() soup = BeautifulSoup(content) text = soup.get_text() if 'GIS ' in text: print 'Job Found!'

3条回答

网友

1楼 · 编辑于 2024-09-30 12:32:49

这是我的尝试，但和上面差不多：

from bs4 import BeautifulSoup
from urllib2 import urlopen

def work(url):
    soup = BeautifulSoup(urlopen(url).read())

    for i in soup.findAll("a", text=True):
        if "GIS" in i.text:
            print "Found link "+i["href"].replace("?in_iframe=1", "")

urls = ["https://jobs-challp.icims.com/jobs/search?pr=0&searchKeyword=gis&searchRadius=20&in_iframe=1", "https://www.smartrecruiters.com/SpectraForce1/"]

for i in urls:
    work(i)

它定义了一个函数“work（）”来完成实际的工作，从远程服务器获取页面；使用urlopen()，因为它看起来像您想要使用urllib2，但我建议您使用Python-Requests；然后它使用findAll()查找所有的{}元素（链接），并检查每个链接的文本中是否有“GIS”，如果是，那么它将打印链接的href属性。在

然后它使用list comprehension定义URL列表（本例中只有2个URL），然后为列表中的每个URL运行work()函数，并将其作为参数传递给函数。在

网友

2楼 · 编辑于 2024-09-30 12:32:49

开始吧！在

此代码将查找所有包含“GIS”字符串的链接。我需要添加&in_iframe=1以使第一个链接正常工作。在

import urllib2
from bs4 import BeautifulSoup

urls = ['https://jobs-challp.icims.com/jobs/search?ss=1&searchKeyword=gis&searchCategory=&searchLocation=&latitude=&longitude=&searchZip=&searchRadius=20&in_iframe=1',
        'https://www.smartrecruiters.com/SpectraForce1/']

for url in urls:
    soup = BeautifulSoup(urllib2.urlopen(url))
    print 'Scraping {}'.format(url)
    for link in soup.find_all('a'):
        if 'GIS' in link.text:
            print ' > TEXT: ' + link.text.strip()
            print ' > URL:  ' + link['href']
            print ''

输出：

^{pr2}$

网友

3楼 · 编辑于 2024-09-30 12:32:49

有一种方法：

from bs4 import BeautifulSoup
import urllib2
import re

url = 'https://www.smartrecruiters.com/SpectraForce1/'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)

titles = [i.get_text() for i in soup.findAll('a', {'target':'_blank'})]
jobs = [re.sub('\s+',' ',title) for title in titles]

links = [i.get('href') for i in soup.findAll('a', {'target':'_blank'})]

for i,j in enumerate(jobs):
    if 'GIS' in j:
        print links[i]

如果您现在运行此程序，它将打印：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章