抓取python:现有连接被远程主机强制关闭

2024-06-26 01:38:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我从这个网站上抓取https://www.bi.go.id/id/statistik/informasi-kurs/transaksi-bi/Default.aspx来获取kurs价格表。看起来我被那个网站屏蔽了。 enter image description hereenter image description here 使用selenium和bs4,但是当我试图获取表时,我得到了如下错误

ChunkedEncodingError: ("Connection broken: ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)", ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

这是我的密码

driver = webdriver.Chrome()
driver.get("https://www.bi.go.id/id/statistik/informasi-kurs/transaksi-bi/Default.aspx")

wait = WebDriverWait(driver, 10)

driver.implicitly_wait(10) #secs


# click "usd"

book = wait.until(EC.element_to_be_clickable((By.ID,"selectPeriod")))
sel = Select(book)
sel.select_by_value("range")

bookk = wait.until(EC.element_to_be_clickable((By.ID,"ctl00_PlaceHolderMain_g_6c89d4ad_107f_437d_bd54_8fda17b556bf_ctl00_ddlmatauang1")))
sel = Select(bookk)
sel.select_by_value("USD  ")

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

start_date = driver.find_element_by_id("ctl00_PlaceHolderMain_g_6c89d4ad_107f_437d_bd54_8fda17b556bf_ctl00_txtFrom")
start_date.send_keys("20-Nov-15")
end_date = driver.find_element_by_id("ctl00_PlaceHolderMain_g_6c89d4ad_107f_437d_bd54_8fda17b556bf_ctl00_txtTo")
end_date.send_keys(time.strftime("%d-%m-%Y"))

time.sleep(5)
buttons = driver.find_elements_by_xpath("//input[@value='Cari']")
buttons[1].click()

src = driver.page_source # gets the html source of the page
headers = {
    #"Referer": "https://id.investing.com/commodities/gold-historical-data",
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0",
    "X-Requested-With": "XMLHttpRequest"
}
parser = BeautifulSoup(src,"lxml") # initialize the parser and parse the source "src"
url = "https://www.bi.go.id/id/statistik/informasi-kurs/transaksi-bi/Default.aspx"
r = requests.get(url, headers=headers)
html = r.text
table = parser.find("table", attrs={"class" : "table1"}) # A list of attributes that you want to check in a tag)
rows = table.find_all('tr')
data = []
for row in rows[1:]:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])
    
result = pd.DataFrame(data, columns=['nilai', 'kurs_jual', 'kurs_beli', 'tanggal'])
result.to_csv("kurs1.csv", index=False)

df = pd.read_csv("kurs1.csv")
pd.set_option('display.max_rows', df.shape[0]+1)
print(df)

我该怎么办?请帮助我,事实上一个月前我已经成功了,但是突然那个网站上的课程id发生了变化,所以我不得不全部更改。当我再次尝试运行它时,我遇到了连接错误。几周前我就被卡住了!提前谢谢你csv文件应该是这样的 enter image description here


Tags: csvthetohttpsnoneidbydriver