如何使用python(最好是ulsou)从这个表中提取信息

2024-10-04 03:23:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从以下页面收集信息:http://www.gatesfoundation.org/How-We-Work/Quick-Links/Grants-Database#q/page=2

特别是,我尝试使用BeautifulSoup从表中收集信息。我有以下代码:

pagelink = 'http://www.gatesfoundation.org/How-We-Work/Quick-Links/Grants-Database#q/page=2'
page = urllib2.urlopen(pagelink)
soup = BeautifulSoup(page)
soup.prettify()
print soup

当我这样做时,表的内容(在“tablebody”标签中)不会显示出来。为什么会这样?如何从这个表中提取信息?你知道吗


Tags: org信息httpwwwpagelinksquickdatabase
3条回答

你可以用^{}这样得到它:

import dryscrape
from bs4 import BeautifulSoup

ses = dryscrape.Session()
ses.visit("http://www.gatesfoundation.org/How-We-Work/Quick-Links/Grants-Database#q/page=2")
s = BeautifulSoup(ses.body())
s2 = s.select("table.table.push-bottom")[0]
print s2

您将无法按预期使用BeautifulSoup4,因为页面是通过JavaScript呈现的。你知道吗

您可以使用dryscrapeselenium。在我看来,Dryscrape对用户更友好,但在Windows上并没有得到官方的支持。你知道吗

另外,请查看avis'关于这一点的优秀答案:

https://stackoverflow.com/a/26440563/1429776

您要查找的内容不是来自该URL的。你知道吗

因此,基本上,当您在Chrome等现代web浏览器中手动浏览页面时,您从该页面看到的内容通常不完全来自您请求的URL。整个过程将是:从最初请求的url获取内容->;解析内容->;加载CSS/JavaScript/图像(大多数情况下从不同的url)->;布局页面/根据CSS/JavaScript请求发出额外请求。它可能看起来像是你得到的仅仅是你最初在地址栏中输入的URL,但实际上浏览器做了大量的幕后工作来完全呈现网页。你知道吗

现在回到您正在浏览的页面,表的内容实际上由JavaScript填充,浏览器首先解析JavaScript,然后发出额外请求以获取内容并呈现为一个完整的页面。你知道吗

您可以使用FiddlerCharles等工具来捕获整个过程并分析所有流量,以找出幕后发生的情况,在这种情况下,请求获取该表的内容:

POST http://www.gatesfoundation.org/services/gfo/search.ashx HTTP/1.1
Host: www.gatesfoundation.org
Connection: keep-alive
Content-Length: 209
Accept: */*
Origin: http://www.gatesfoundation.org
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36
Content-Type: application/json; charset=UTF-8;
Referer: http://www.gatesfoundation.org/How-We-Work/Quick-Links/Grants-Database
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.8
Cookie: gfo#lang=en; ASP.NET_SessionId=bdgjkbuyxxxcmfm40ejl2j1j; s_vnum=1641950372052%26vn%3D1; s_vi=[CS]v1|2C3C15910519363E-60000611E0003318[CE]; _vwo_uuid_v2=226610E3774AD35E29B29E7C20948349|f180edd6ae6830ab3de2432cd15b0bd4; __atuvc=3%7C2; __atuvs=58782b230157ce4a002; s_cc=true; s_nr=1484270424338; s_lv=1484270424339; s_lv_s=First%20Visit; s_invisit=true; gpv_p14=Awarded%20Grants; gpv_p19=How%20We%20Work; gpv_p21=no%20value; s_ppn=Awarded%20Grants; s_ppvl=Awarded%2520Grants%2C39%2C39%2C638%2C1366%2C638%2C1366%2C768%2C1%2CP; s_sq=%5B%5BB%5D%5D; s_ppv=Awarded%2520Grants%2C67%2C67%2C638%2C1366%2C638%2C1366%2C768%2C1%2CP

{"freeTextQuery":"","fieldQueries":"(@gfomediatype==\"Grant\")","facetsToRender":["gfocategories","gfotopics","gfoyear","gforegions"],"page":"2","resultsPerPage":"12","sortBy":"gfodate","sortDirection":"desc"}

响应是JSON格式的:

{
  "topResults": [],
  "results": [
    {
      "amount": 648140,
      "categories": [
        "Global Health"
      ],
      "date": "2016-12-19T08:00:00",
      "description": "to validate biomarkers of growth stunting and environmental enteric dysfunction for the purpose of better understanding and diagnosing these related disease states",
      "grantee": "Stanford University",
      "iconUrl": "",
      "languageCode": "en",
      "mediaType": "Grant",
      "regions": [
        ""
      ],
      "subtitle": null,
      "thumbnailAltText": "",
      "thumbnailUrl": "",
      "title": "Stanford University",
      "topics": [
        "Enteric Diseases and Diarrhea"
      ],
      "url": "/How-We-Work/Quick-Links/Grants-Database/Grants/2016/12/OPP1161946",
      "year": "2016"
    },
    {
      "amount": 550000,
      "categories": [
        "Global Development"
      ],
      "date": "2016-12-15T08:00:00",
      "description": "to provide vital life-saving and sustaining support to populations most affected by conflict in Syria",
      "grantee": "World Vision",
      "iconUrl": "",
      "languageCode": "en",
      "mediaType": "Grant",
      "regions": [
        ""
      ],
      "subtitle": null,
      "thumbnailAltText": "",
      "thumbnailUrl": "",
      "title": "World Vision",
      "topics": [
        "Emergency Response"
      ],
      "url": "/How-We-Work/Quick-Links/Grants-Database/Grants/2016/12/OPP1169747",
      "year": "2016"
    },
    {
      "amount": 3315475,
      "categories": [
        "Global Development"
      ],
      "date": "2016-12-15T08:00:00",
      "description": "to fund activities focused on generating political will and building momentum for investment in nutrition at country level and supporting the development and implementation of the nutrition...",
      "grantee": "African Development Bank",
      "iconUrl": "",
      "languageCode": "en",
      "mediaType": "Grant",
      "regions": [
        ""
      ],
      "subtitle": null,
      "thumbnailAltText": "",
      "thumbnailUrl": "",
      "title": "African Development Bank",
      "topics": [
        "Nutrition"
      ],
      "url": "/How-We-Work/Quick-Links/Grants-Database/Grants/2016/12/OPP1158425",
      "year": "2016"
    },
    {
      "amount": 500,
      "categories": [
        "Special Projects"
      ],
      "date": "2016-12-14T08:00:00",
      "description": "to provide for general operating support",
      "grantee": "City Club",
      "iconUrl": "",
      "languageCode": "en",
      "mediaType": "Grant",
      "regions": [
        ""
      ],
      "subtitle": null,
      "thumbnailAltText": "",
      "thumbnailUrl": "",
      "title": "City Club",
      "topics": [
        "Community Grants"
      ],
      "url": "/How-We-Work/Quick-Links/Grants-Database/Grants/2016/12/OPP1169105",
      "year": "2016"
    },
    {
      "amount": 78522,
      "categories": [
        "Global Health"
      ],
      "date": "2016-12-12T08:00:00",
      "description": "to make the first description of specific histo-blood group antigens (HBGAs) in Zambian children and to assess their influence on immunogenicity of rotavirus vaccines.",
      "grantee": "CIDRZ",
      "iconUrl": "",
      "languageCode": "en",
      "mediaType": "Grant",
      "regions": [
        ""
      ],
      "subtitle": null,
      "thumbnailAltText": "",
      "thumbnailUrl": "",
      "title": "CIDRZ",
      "topics": [
        "Enteric Diseases and Diarrhea",
        "Vaccine Delivery",
        "Vaccine Development"
      ],
      "url": "/How-We-Work/Quick-Links/Grants-Database/Grants/2016/12/OPP1162810",
      "year": "2016"
    },
    {
      "amount": 300000,
      "categories": [
        "US Program"
      ],
      "date": "2016-12-09T08:00:00",
      "description": "to provide matching i3 funds with the goal of building professional capacity through effective professional development for teacher leaders and principals to improve college ready outcomes...",
      "grantee": "Leading Educators Inc",
      "iconUrl": "",
      "languageCode": "en",
      "mediaType": "Grant",
      "regions": [
        ""
      ],
      "subtitle": null,
      "thumbnailAltText": "",
      "thumbnailUrl": "",
      "title": "Leading Educators Inc",
      "topics": [
        "K-12 Education"
      ],
      "url": "/How-We-Work/Quick-Links/Grants-Database/Grants/2016/12/OPP1169456",
      "year": "2016"
    },
    {
      "amount": 85330,
      "categories": [
        "Global Health"
      ],
      "date": "2016-12-09T08:00:00",
      "description": "to collect and analyze existing data from multiple data streams from Asian and African sites to characterize early burden of rotavirus disease, which is less-well characterized than...",
      "grantee": "Emory University",
      "iconUrl": "",
      "languageCode": "en",
      "mediaType": "Grant",
      "regions": [
        ""
      ],
      "subtitle": null,
      "thumbnailAltText": "",
      "thumbnailUrl": "",
      "title": "Emory University",
      "topics": [
        "Enteric Diseases and Diarrhea",
        "Vaccine Delivery",
        "Vaccine Development"
      ],
      "url": "/How-We-Work/Quick-Links/Grants-Database/Grants/2016/12/OPP1163272",
      "year": "2016"
    },
    {
      "amount": 13000,
      "categories": [
        "US Program"
      ],
      "date": "2016-12-08T08:00:00",
      "description": "to support LearnLaunch Across Boundaries Conference",
      "grantee": "LearnLaunch Institute",
      "iconUrl": "",
      "languageCode": "en",
      "mediaType": "Grant",
      "regions": [
        ""
      ],
      "subtitle": null,
      "thumbnailAltText": "",
      "thumbnailUrl": "",
      "title": "LearnLaunch Institute",
      "topics": [
        "K-12",
        "K-12 Education"
      ],
      "url": "/How-We-Work/Quick-Links/Grants-Database/Grants/2016/12/OPP1169222",
      "year": "2016"
    },
    {
      "amount": 250000,
      "categories": [
        "US Program"
      ],
      "date": "2016-12-08T08:00:00",
      "description": "to improve outcomes for English Language Learners in Seattle and South King County",
      "grantee": "OneAmerica",
      "iconUrl": "",
      "languageCode": "en",
      "mediaType": "Grant",
      "regions": [
        ""
      ],
      "subtitle": null,
      "thumbnailAltText": "",
      "thumbnailUrl": "",
      "title": "OneAmerica",
      "topics": [
        "Community Grants"
      ],
      "url": "/How-We-Work/Quick-Links/Grants-Database/Grants/2016/12/OPP1164859",
      "year": "2016"
    },
    {
      "amount": 85000,
      "categories": [
        "Global Health"
      ],
      "date": "2016-12-08T08:00:00",
      "description": "to fund cholera / enteric researchers (travel costs) to attend the 51st US-Japan Cholera Conference that they would otherwise not be able to afford to contribute to.",
      "grantee": "International Vaccine Institute",
      "iconUrl": "",
      "languageCode": "en",
      "mediaType": "Grant",
      "regions": [
        ""
      ],
      "subtitle": null,
      "thumbnailAltText": "",
      "thumbnailUrl": "",
      "title": "International Vaccine Institute",
      "topics": [
        "Enteric Diseases and Diarrhea"
      ],
      "url": "/How-We-Work/Quick-Links/Grants-Database/Grants/2016/12/OPP1168711",
      "year": "2016"
    },
    {
      "amount": 6000,
      "categories": [
        "Special Projects"
      ],
      "date": "2016-12-07T08:00:00",
      "description": "to provide for general operating support",
      "grantee": "Center for US Global Leadership",
      "iconUrl": "",
      "languageCode": "en",
      "mediaType": "Grant",
      "regions": [
        ""
      ],
      "subtitle": null,
      "thumbnailAltText": "",
      "thumbnailUrl": "",
      "title": "Center for US Global Leadership",
      "topics": [
        "Community Grants"
      ],
      "url": "/How-We-Work/Quick-Links/Grants-Database/Grants/2016/12/OPP1167614",
      "year": "2016"
    },
    {
      "amount": 3000000,
      "categories": [
        "US Program"
      ],
      "date": "2016-12-07T08:00:00",
      "description": "to support the Center on Education and the Workforce's research and policy agenda to better align postsecondary education and the workforce, with an emphasis on inequalities in the...",
      "grantee": "Georgetown University",
      "iconUrl": "",
      "languageCode": "en",
      "mediaType": "Grant",
      "regions": [
        ""
      ],
      "subtitle": null,
      "thumbnailAltText": "",
      "thumbnailUrl": "",
      "title": "Georgetown University",
      "topics": [
        "Postsecondary Success"
      ],
      "url": "/How-We-Work/Quick-Links/Grants-Database/Grants/2016/12/OPP1165028",
      "year": "2016"
    }
  ],
  "facets": [
    {
      "field": "gfocategories",
      "items": [
        {
          "name": "US Program",
          "count": 5859
        },
        {
          "name": "Global Development",
          "count": 4441
        },
        {
          "name": "Global Health",
          "count": 3719
        },
        {
          "name": "Communications",
          "count": 1149
        },
        {
          "name": "Global Policy & Advocacy",
          "count": 879
        },
        {
          "name": "Special Projects",
          "count": 465
        }
      ]
    },
    {
      "field": "gfotopics",
      "items": [
        {
          "name": "Community Grants",
          "count": 2393
        },
        {
          "name": "K-12 Education",
          "count": 2007
        },
        {
          "name": "Global Policy & Advocacy",
          "count": 1507
        },
        {
          "name": "Communications",
          "count": 1246
        },
        {
          "name": "Discovery and Translational Sciences",
          "count": 1227
        },
        {
          "name": "Agricultural Development",
          "count": 866
        },
        {
          "name": "K-12",
          "count": 862
        },
        {
          "name": "HIV",
          "count": 690
        },
        {
          "name": "Global Libraries",
          "count": 671
        },
        {
          "name": "Vaccine Delivery",
          "count": 655
        },
        {
          "name": "Postsecondary Success",
          "count": 645
        },
        {
          "name": "Family Health: Family Planning",
          "count": 625
        },
        {
          "name": "Family Health: Nutrition",
          "count": 530
        },
        {
          "name": "Family Health: Maternal, Newborn, and Child Health",
          "count": 433
        },
        {
          "name": "Community Relations",
          "count": 420
        },
        {
          "name": "Vaccine Development",
          "count": 393
        },
        {
          "name": "Not Available",
          "count": 383
        },
        {
          "name": "Malaria",
          "count": 377
        },
        {
          "name": "Water, Sanitation, and Hygiene",
          "count": 374
        },
        {
          "name": "Emergency Response",
          "count": 368
        },
        {
          "name": "Enteric Diseases and Diarrhea",
          "count": 359
        },
        {
          "name": "Family Interest Grants",
          "count": 313
        },
        {
          "name": "Pneumonia",
          "count": 286
        },
        {
          "name": "Nutrition",
          "count": 284
        },
        {
          "name": "Financial Services for the Poor",
          "count": 277
        },
        {
          "name": "Tuberculosis",
          "count": 277
        },
        {
          "name": "Libraries",
          "count": 262
        },
        {
          "name": "Charitable Sector Support",
          "count": 224
        },
        {
          "name": "Pacific Northwest: Family Homelessness",
          "count": 223
        },
        {
          "name": "College Ready",
          "count": 205
        },
        {
          "name": "Research & Development",
          "count": 195
        },
        {
          "name": "Polio",
          "count": 188
        },
        {
          "name": "Pacific Northwest: Early Learning",
          "count": 182
        },
        {
          "name": "Integrated Delivery",
          "count": 172
        },
        {
          "name": "Table Sponsorships",
          "count": 164
        },
        {
          "name": "Integrated Development",
          "count": 119
        },
        {
          "name": "Strategic Partnerships",
          "count": 117
        },
        {
          "name": "India",
          "count": 116
        },
        {
          "name": "Neglected Tropical Diseases",
          "count": 115
        },
        {
          "name": "Africa",
          "count": 89
        },
        {
          "name": "Special Initiatives (Active projects are now part of other strategies)",
          "count": 67
        },
        {
          "name": "Neglected and Infectious Diseases",
          "count": 66
        },
        {
          "name": "China",
          "count": 43
        },
        {
          "name": "Scholarships",
          "count": 39
        },
        {
          "name": "Tobacco",
          "count": 33
        },
        {
          "name": "Europe",
          "count": 22
        },
        {
          "name": "Special Initiatives",
          "count": 22
        },
        {
          "name": "Philanthropic Partnerships",
          "count": 17
        },
        {
          "name": "Europe Office",
          "count": 4
        }
      ]
    },
    {
      "field": "gfoyear",
      "items": [
        {
          "name": "2009 and earlier",
          "count": 6608
        },
        {
          "name": "2015",
          "count": 1652
        },
        {
          "name": "2016",
          "count": 1546
        },
        {
          "name": "2013",
          "count": 1473
        },
        {
          "name": "2014",
          "count": 1472
        },
        {
          "name": "2012",
          "count": 1260
        },
        {
          "name": "2011",
          "count": 1240
        },
        {
          "name": "2010",
          "count": 921
        },
        {
          "name": "2017",
          "count": 3
        }
      ]
    },
    {
      "field": "gforegions",
      "items": [
        {
          "name": "North America",
          "count": 5817
        },
        {
          "name": "Sub-Saharan Africa",
          "count": 1546
        },
        {
          "name": "Asia",
          "count": 1192
        },
        {
          "name": "Middle East, North Africa, and Greater Arabia",
          "count": 223
        },
        {
          "name": "South America",
          "count": 152
        },
        {
          "name": "Europe",
          "count": 130
        },
        {
          "name": "Central America and the Caribbean",
          "count": 110
        },
        {
          "name": "Australia and Oceania",
          "count": 29
        }
      ]
    }
  ],
  "totalCount": 16175
}

使用内置的json模块,您可以轻松地提取所需的信息。你知道吗

相关问题 更多 >