python中的Web抓取

3条回答

网友

1楼 · 编辑于 2024-09-27 19:30:59

问题是，我已经完成了代码并尝试了。这是可行的，但这不是问题的答案。从链接中获取角色并将其组合在一起是行不通的。我试了很多东西，我自己还在努力。我的建议是，你自己解决。这会有更多的回报，可能会对未来的竞争有所帮助。另外，如果你想从代码中删除所有的'a'，那也行不通。我试过了。在

要回答堆栈溢出问题，下面是代码（您需要先安装'requests'python modeule）：

import requests
page1 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt1?verify=4VjvSgWQQ8yhhiYD9cePtg%3D%3D"
page1_content = requests.get(page1)
page1text = page1_content.text

page2 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt2?verify=4VjvSgWQQ8yhhiYD9cePtg%3D%3D"
page2_content = requests.get(page2)
page2text = page2_content.text

page3 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt3?verify=4VjvSgWQQ8yhhiYD9cePtg%3D%3D"
page3_content = requests.get(page3)
page3text = page3_content.text

page4 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt4?verify=4VjvSgWQQ8yhhiYD9cePtg%3D%3D"
page4_content = requests.get(page4)
page4text = page4_content.text

page5 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt5?verify=4VjvSgWQQ8yhhiYD9cePtg%3D%3D"
page5_content = requests.get(page5)
page5text = page5_content.text

print(page1text + page2text + page3text + page4text + page5text)

但是这个方法不能回答挑战14。在

网友

2楼 · 编辑于 2024-09-27 19:30:59

我知道问题的答案，但我将告诉你一种你可能找到的方法，而不是给出完成它的代码，因为我自己完成了这个问题。在

当你问这个问题时，你完全忘记了有第六个链接：https://assess.joincyberdiscovery.com/challenge-files/get-flag?verify=j7fPvtmWLDY5qeYFuJtmKw%3D%3D&string=%3Cclock%20pts%3E

请注意，在这个超链接的末尾，它显示“clock pts”，而所有其他链接都有类似clock-pt1或clock-pt4的内容。如果clock pts同时引用了所有不同的链接，比如您必须从所有先前的链接中创建一个字符串，那么用您从各个链接中生成的字符串替换超链接字符串部分中的“clock pts”，这将为您提供完成级别的代码？在

下面是我用来得到答案的代码。它需要请求模块，以防您想使用它。（另外，我也不是100%肯定它会一直工作，因为挑战是基于计时器的，程序可能无法在时钟更改前及时获取所有字符串，因此请确保在计时器重置后运行程序）

    import requests
    page1 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt1?verify=4VjvSgWQQ8yhhiYD9cePtg%3D%3D"
    page1_content = requests.get(page1)
    page1text = page1_content.text

    page2 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt2?verify=4VjvSgWQQ8yhhiYD9cePtg%3D%3D"
    page2_content = requests.get(page2)
    page2text = page2_content.text

    page3 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt3?verify=4VjvSgWQQ8yhhiYD9cePtg%3D%3D"
    page3_content = requests.get(page3)
    page3text = page3_content.text

    page4 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt4?verify=4VjvSgWQQ8yhhiYD9cePtg%3D%3D"
    page4_content = requests.get(page4)
    page4text = page4_content.text

    page5 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt5?verify=4VjvSgWQQ8yhhiYD9cePtg%3D%3D"
    page5_content = requests.get(page5)
    page5text = page5_content.text

    code=(page1text + page2text + page3text + page4text + page5text)

    page6= "https://assess.joincyberdiscovery.com/challenge-files/get-flag?verify=j7fPvtmWLDY5qeYFuJtmKw%3D%3D&string="+code
    page6_content = requests.get(page6)
    print(page6_content.text)

网友

3楼 · 编辑于 2024-09-27 19:30:59

我做了一些非常相似的事情，但最后的结果却很糟糕。不过，我还是让它运行了一段时间，并注意到时钟遵循一种模式。不久前，时钟的读数是“aaaaaaaaaaaaaaaa”，然后是“abaaaafaa2aa3a”和“adafaaaajaala”。我将等待一个完整的列表，并尝试在最后的URL中建议下一个时钟序列。如果这行得通，我会给你回电的，只是想一想。在

另外，为了帮助导入情态，我建议： https://programminghistorian.org/lessons/installing-python-modules-pip &安培； https://docs.python.org/3/installing/index.html

import requests
abc = ""
while 1 == 1 :
    page1 = requests.get('your first link')
    page2 = requests.get('your second link')
    page3 = requests.get('your thrid link')
    page4 = requests.get('your fourth link')
    page5 = requests.get('your fith link')
    text = page1.text+page2.text+page3.text+page4.text+page5.text

    # abc1 = "the verify link except clock pts is replaced with "+"text>" so the end looks like this :string=<"+text+">"
    abc1 = text
    if abc1 != abc:
       print (abc1)
       abc = abc1

编辑时钟以15分钟的周期运行，总共有90个代码，我不知道这有什么帮助，但只是发表想法。我不得不做一些更改，以使代码输出干净，这是我的改进版本（这是非常混乱的抱歉）：

^{pr2}$

最终编辑 我花了这么长时间才弄明白这是怎么回事，而且做了太多的工作。在提交最终的url时，不要将您的解决方案作为节的替换，也不要包含在<；>中，因此您的解决方案应该是https://assess.joincyberdiscovery.com/challenge-files/get-flag?verify=*this is an identifiere*&string=*The string you get*

相关问题更多 >

编程相关推荐

热门问题

热门文章