我的代码将字典列表视为字符串,typeerror:typeerror:string索引必须是整数

2024-10-01 04:45:24 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,我使用redditapi,由于一些与本案例无关的原因,我希望在这种情况下不使用reddit包装器。代码其实很简单,它从subreddit中的一个特定帖子中提取评论和1级回复。你知道吗

这是函数的代码

def getcommentsforpost(subredditname,postid,):

    #here we make the request to reddit, and create a python dictionary   
    #from the resulting json code


    reditpath = '/r/' + subredditname + '/comments/' + postid
    redditusual = 'https://www.reddit.com'
    parameters = '.json?'
    totalpath = redditusual + reditpath + parameters
    p = requests.get(totalpath, headers = {'User-agent' : 'Chrome'})
    result = p.json()

    #we are going to be looping a lot through dictionaries, to extract
    # the comments and their replies, thus, a list where we will insert  
    # them.
    totallist = [] 

    # the result object is a list with two dictionaries, one with info 
    #on the post, and the second one with all the info regarding the 
    #comments and their respective replies, because of this, we first 
    # process the posts info located in result[0]


    a = result[0]["data"]["children"][0]["data"]
    abody = a["selftext"]
    aauthor = a["author"]
    ascore = a["score"]
    adictionary = {"commentauthor" : aauthor , "comment" : abody , "Type" : "Post",
                       "commentscore" : ascore}

    totallist.append(adictionary)




    # and now, we start processing the comments, located in result[1]

    for i in result[1]["data"]["children"]:

        ibody = i["data"]["body"]
        iauthor = i["data"]["author"]
        iscore = i["data"]["score"]



        idictionary = {"commentauthor" : iauthor , "comment" : ibody , "Type" : "post_comment",
                       "commentscore" : iscore}

        totallist.append(idictionary)

       # to clarify, until here, the code works perfectly. No problem 
       # whatsoever, its exactly in the following section where  the 
       #error happens. 

       # we create a new object, called replylist, 
        #that contains a  list of dictionaries in every interaction of 
        #the loop. 

        replylists =  i["data"]["replies"]["data"]["children"]

        # we are going to loop through them, in every comment we extract


        for j in replylists:
            jauthor = j["data"]["author"]
            jbody = j["data"]["body"]
            jscore = j["data"]["score"]


            jdictionary = {"commentauthor" : jauthor , "comment" : jbody , "Type" : "comment_reply" , 
                           "commentscore" : jscore } 
            totallist.append(jdictionary)

        # just like we did with the post info and the normal comments,
         # we extract and put it in totallist. 



        finaldf = pd.DataFrame(totallist)



    return(finaldf)

getcommentsforpost("Python","a7zss0")

但是在对回复执行循环时,代码失败了。它返回这个错误“string indexes must be integers”,向变量replylists发出错误信号,但是当我在循环外执行这样的代码时

result[1]["data"]["children"][4]["data"]["replies"]["data"]["children"][0]

效果很好,应该是一样的效果。我相信它将replylists视为一个字符串,而不是一个列表(这是它的类)

我尝试过的事情:

我尝试确保replylists类是一个带有type()函数的列表,它证明返回“list”,但是对于循环的5次交互,它失败了,并且出现了相同的错误。你知道吗

我尝试使用for ja in range(0,len(replylists))创建列表循环,然后将j变量创建为replylists[ja]。它返回了同样的错误。你知道吗

我已经调试了两个小时了,如果没有代码片段,这个函数可以很好地工作(当然,它不会在最终的数据帧中返回回复,但是它可以工作)。为什么会这样?replylists是一个字典列表,不是字符串,但它给出了一个奇怪的错误。你知道吗

下面是我们正在使用的函数的reddit文档: https://www.reddit.com/dev/api#GET_comments_{文章}

要导入的库: 请求, 作为警察, json文件

我重复一遍,推荐包装器不是一个解决方案,我想用json和rest来解决这个问题。你知道吗

正在处理此问题: 'Python版本3.6.5 | Anaconda版本5.2.0,jupyter笔记本5.5.0'

先谢谢你。希望它变得有趣,我会继续从这里工作。你知道吗


Tags: andtheto代码injsondatacomment
2条回答

下面是我如何解决它的,创建了一个if语句来检查[“data”][“replies”]是否是字典,在这种情况下执行代码,如果不是,则继续循环。你知道吗

这是它的样子,再次感谢阿迪蒂亚和高约:

def getcommentsforpost(subredditname,postid,):
reditpath = '/r/' + subredditname + '/comments/' + postid
redditusual = 'https://www.reddit.com'
parameters = '.json?'
totalpath = redditusual + reditpath + parameters
p = requests.get(totalpath, headers = {'User-agent' : 'Chrome'})
result = p.json()

totallist = []

# the result object is a list with two dictionaries, one with info on the post, and the second one
# with all the info regarding the comments and their respective replies 
a = result[0]["data"]["children"][0]["data"]
abody = a["selftext"]
aauthor = a["author"]
ascore = a["score"]
adictionary = {"commentauthor" : aauthor , "comment" : abody , "Type" : "Post",
                   "commentscore" : ascore}

totallist.append(adictionary)


for i in result[1]["data"]["children"]:

    ibody = i["data"]["body"]
    iauthor = i["data"]["author"]
    iscore = i["data"]["score"]


    idictionary = {"commentauthor" : iauthor , "comment" : ibody , "Type" : "post_comment",
                   "commentscore" : iscore}

    totallist.append(idictionary)


    if isinstance(i["data"]["replies"],dict) :

        replylists =  i["data"]["replies"]["data"]["children"]

        for j in replylists:
            jauthor = j["data"]["author"]
            jbody = j["data"]["body"]
            jscore = j["data"]["score"]
            jdictionary = {"commentauthor" : jauthor , "comment" : jbody , "Type" : "comment_reply" , 
                       "commentscore" : jscore } 

            totallist.append(jdictionary)



    elif  type(i["data"]["replies"]) == 'str':
        continue



finaldf = pd.DataFrame(totallist)



return(finaldf)

我做了一些挖掘工作,将您的代码复制到本地环境中,并进行了一些调试,主要是:

try:
    replylists =  i["data"]["replies"]["data"]["children"]
except:
    for point in i['data']:
        print(point)
    exit()

通过这个,我看到实际上,i["data"]有值(实际上是57个值),57个值中有一个包含了replies,但是我仔细查看了一下,发现回复的内容是空的:

'replies': ''是我直接打印出i的中断值时看到的。你知道吗

然而,所有的希望并没有丧失:您只是忘记了忽略那些回复内容为空('')的迭代,因为我还运行了一个检查,看看有多少迭代实际上失败了,有些成功了,有些失败了(由于前面提到的推理)。你知道吗

有了这些,我建议您在出现这样的错误时使用tryexcept,以进行调试(这是一项有用的技能),而且,在您的问题的主题上,找出在回复内容为空时您希望做什么。你知道吗

我祝你一切顺利,希望这对你有所帮助。你知道吗

相关问题 更多 >