我的代码将字典列表视为字符串，typeerror:typeerror:string索引必须是整数

def getcommentsforpost(subredditname,postid,): #here we make the request to reddit, and create a python dictionary #from the resulting json code reditpath = '/r/' + subredditname + '/comments/' + postid redditusual = 'https://www.reddit.com' parameters = '.json?' totalpath = redditusual + reditpath + parameters p = requests.get(totalpath, headers = {'User-agent' : 'Chrome'}) result = p.json() #we are going to be looping a lot through dictionaries, to extract # the comments and their replies, thus, a list where we will insert # them. totallist = [] # the result object is a list with two dictionaries, one with info #on the post, and the second one with all the info regarding the #comments and their respective replies, because of this, we first # process the posts info located in result[0] a = result[0]["data"]["children"][0]["data"] abody = a["selftext"] aauthor = a["author"] ascore = a["score"] adictionary = {"commentauthor" : aauthor , "comment" : abody , "Type" : "Post", "commentscore" : ascore} totallist.append(adictionary) # and now, we start processing the comments, located in result[1] for i in result[1]["data"]["children"]: ibody = i["data"]["body"] iauthor = i["data"]["author"] iscore = i["data"]["score"] idictionary = {"commentauthor" : iauthor , "comment" : ibody , "Type" : "post_comment", "commentscore" : iscore} totallist.append(idictionary) # to clarify, until here, the code works perfectly. No problem # whatsoever, its exactly in the following section where the #error happens. # we create a new object, called replylist, #that contains a list of dictionaries in every interaction of #the loop. replylists = i["data"]["replies"]["data"]["children"] # we are going to loop through them, in every comment we extract for j in replylists: jauthor = j["data"]["author"] jbody = j["data"]["body"] jscore = j["data"]["score"] jdictionary = {"commentauthor" : jauthor , "comment" : jbody , "Type" : "comment_reply" , "commentscore" : jscore } totallist.append(jdictionary) # just like we did with the post info and the normal comments, # we extract and put it in totallist. finaldf = pd.DataFrame(totallist) return(finaldf) getcommentsforpost("Python","a7zss0")

2条回答

网友

1楼 · 编辑于 2024-10-01 04:45:24

下面是我如何解决它的，创建了一个if语句来检查[“data”][“replies”]是否是字典，在这种情况下执行代码，如果不是，则继续循环。你知道吗

这是它的样子，再次感谢阿迪蒂亚和高约：

def getcommentsforpost(subredditname,postid,):
reditpath = '/r/' + subredditname + '/comments/' + postid
redditusual = 'https://www.reddit.com'
parameters = '.json?'
totalpath = redditusual + reditpath + parameters
p = requests.get(totalpath, headers = {'User-agent' : 'Chrome'})
result = p.json()

totallist = []

# the result object is a list with two dictionaries, one with info on the post, and the second one
# with all the info regarding the comments and their respective replies 
a = result[0]["data"]["children"][0]["data"]
abody = a["selftext"]
aauthor = a["author"]
ascore = a["score"]
adictionary = {"commentauthor" : aauthor , "comment" : abody , "Type" : "Post",
                   "commentscore" : ascore}

totallist.append(adictionary)


for i in result[1]["data"]["children"]:

    ibody = i["data"]["body"]
    iauthor = i["data"]["author"]
    iscore = i["data"]["score"]


    idictionary = {"commentauthor" : iauthor , "comment" : ibody , "Type" : "post_comment",
                   "commentscore" : iscore}

    totallist.append(idictionary)


    if isinstance(i["data"]["replies"],dict) :

        replylists =  i["data"]["replies"]["data"]["children"]

        for j in replylists:
            jauthor = j["data"]["author"]
            jbody = j["data"]["body"]
            jscore = j["data"]["score"]
            jdictionary = {"commentauthor" : jauthor , "comment" : jbody , "Type" : "comment_reply" , 
                       "commentscore" : jscore } 

            totallist.append(jdictionary)



    elif  type(i["data"]["replies"]) == 'str':
        continue



finaldf = pd.DataFrame(totallist)



return(finaldf)

网友

2楼 · 编辑于 2024-10-01 04:45:24

我做了一些挖掘工作，将您的代码复制到本地环境中，并进行了一些调试，主要是：

try:
    replylists =  i["data"]["replies"]["data"]["children"]
except:
    for point in i['data']:
        print(point)
    exit()

通过这个，我看到实际上，i["data"]有值（实际上是57个值），57个值中有一个包含了replies，但是我仔细查看了一下，发现回复的内容是空的：

'replies': ''是我直接打印出i的中断值时看到的。你知道吗

然而，所有的希望并没有丧失：您只是忘记了忽略那些回复内容为空（''）的迭代，因为我还运行了一个检查，看看有多少迭代实际上失败了，有些成功了，有些失败了（由于前面提到的推理）。你知道吗

有了这些，我建议您在出现这样的错误时使用try和except，以进行调试（这是一项有用的技能），而且，在您的问题的主题上，找出在回复内容为空时您希望做什么。你知道吗

我祝你一切顺利，希望这对你有所帮助。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章