如何迭代URL到curl命令？

#!/bin/bash curl http://mythicspoiler.com/sets.html | cat >>mainpage.txt python creatingAListOfAllExpansions.py #returns two txt files containing the expansion links and the commander decks' links rm mainpage.txt #get the pages from the links cat commanderDeckLinks.txt | while read a ; do curl $a | ##THIS DOESN'T WORK cat >>$(echo $a | cut --delimiter="/" -f4).txt done

http://mythicspoiler.com/cmd/index.html http://mythicspoiler.com/c13/index.html http://mythicspoiler.com/c14/index.html http://mythicspoiler.com/c15/index.html http://mythicspoiler.com/c16/index.html http://mythicspoiler.com/c17/index.html http://mythicspoiler.com/c18/index.html http://mythicspoiler.com/c19/index.html http://mythicspoiler.com/c20/index.html

#reads the main page of the website with open("mainpage.txt") as datafile: data = datafile.read() #gets the content after the first appearance of the introduced string def getContent(data, x): j=0 content=[] for i in range(len(data)): if(data[i].strip().startswith(x) and j == 0): j=i if(i>j and j != 0): content.append(data[i]) return content #gets the content of the website that is inside the body tag mainNav = getContent(data.splitlines(), "") #gets the content of the website that is inside of the outside center tags content = getContent(mainNav, "") #removes extra content from list def restrictNoise(data, string): content=[] for i in data: if(i.startswith(string)): break content.append(i) return content #return only lines which are links def onlyLinks(data): content=[] for i in data: if(i.startswith("<a")): content.append(i) return content #creates a list of the ending of the links to later fetch def links(data): link=[] for i in data: link.append(i.split('"')[1]) return link #adds the rest of the link def completLinks(data): completeLinks=[] for i in data: completeLinks.append("http://mythicspoiler.com/"+i) return completeLinks #getting the commander decks commanderDecksAndNoise = getContent(content,"") commanderDeck = restrictNoise(commanderDecksAndNoise, "") commanderDeckLinks = onlyLinks(commanderDeck) commanderDecksCleanedLinks = links(commanderDeckLinks) #creates a txt file and writes in it def writeInTxt(nameOfFile, restrictions, usedList): file = open(nameOfFile,restrictions) for i in usedList: file.write(i+"\n") file.close() #creating the commander deck text file writeInTxt("commanderDeckLinks.txt", "w+", completLinks(commanderDecksCleanedLinks)) #getting the expansions expansionsWithNoise = getContent(commanderDecksAndNoise, "") expansionsWithoutNoise = restrictNoise(expansionsWithNoise, "</table>") expansionsLinksWNoise = onlyLinks(expansionsWithoutNoise) expansionsCleanedLinks = links(expansionsLinksWNoise) #creating the expansions text file writeInTxt("expansionLinks.txt", "w+", completLinks(expansionsCleanedLinks))

1条回答

网友

1楼 · 发布于 2024-10-03 21:28:28

这里的问题是bash（Linux）和windows的行尾是不同的，分别是LF和CRLF（我不太确定，因为这对我来说都是新的）。因此，当我在python中创建了一个包含以行分隔的项的文件时，bash脚本无法很好地读取它，因为创建的文件有CRLF结尾，而bash脚本只读取LF，这使得URL变得无用，因为它们有一个不应该存在的CR结尾。我不知道如何使用bash代码来解决这个问题，但我所做的是创建一个文件（使用python），其中每个项目都用下划线分隔，“\”，并添加最后一个项目n，这样我就不必处理行尾了。然后我在bash中运行了一个for循环，循环中的每一项都用下划线分隔，最后一项除外。这就解决了问题

相关问题更多 >

编程相关推荐

热门问题

热门文章