将字典转换为dataframe,但dataframe不显示列格式

2024-05-12 18:33:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在致力于将字典转换为数据帧-但数据帧不显示列格式 我希望像Image1(我做的另一个例子),但数据显示为Image2

在第一个示例(Image1)中,我使用一个URL作为新闻源 在第二个示例(Image2)中,我有一个for循环来解析新闻源的多个URL

我还看到第二个示例中的字典有2个“[]”,而第一个字典只有一个[]

我可以提供更多的细节。 如果可以的话,请帮助我

提前谢谢大家

Image1 - dictionary to pandas dataframe output shows up fine

Image2 - dictionary to pandas dataframe output DOES NOT shows up fine

enter code here

此处提取实体功能代码:

def extractEntities(url):
    endpoint_watson = "https://gateway.watsonplatform.net/natural-language-understanding/api/v1/analyze"
    params = {
        'version': '2020-09-12',
    }
    headers = { 
        'Content-Type': 'application/json',
    }
    watson_options = {
      "url": url,
      "features": {
        "entities": {
          "sentiment": True,
          "emotion": True,
          "limit": 100
        }
      }
    }
    username = "apikey"
    password = "<<myAPIKeyinfo>>"

    resp = requests.post(endpoint_watson, 
                         data=json.dumps(watson_options), 
                         headers=headers, 
                         params=params, 
                         auth=(username, password) 
                        )
    results = resp.json()
    article_dict = []
    if "entities" in results:
      for i in results.get('entities'):
        initial_dict = {}
        initial_dict['entity'] = i['text']
        initial_dict['url'] = url
        initial_dict['source'] = url.split('.')[1]
        initial_dict['relevance'] = i['relevance']
        initial_dict['sentiment'] = i['sentiment']['score']
        article_dict.append(initial_dict)

      return article_dict

然后提取一些新闻实体

s3 = 'the-wall-street-journal'
allurls3 = []
allurls3 = getNews(s3)
allurls3

下面是调用extractEntities函数的代码。它还包含另一个for循环:

dict1 = []
for u in range(len(allurls3)):
  data3 = []
  url3 = allurls3[u]
  data3 = extractEntities(url3)
  dict1.append(data3)
dict1

Tags: 数据url示例for字典paramswatson新闻
1条回答
网友
1楼 · 发布于 2024-05-12 18:33:27

谢谢你发布代码。在未来,please do not upload images of code/errors when asking a question.并尝试使其成为Minimal, Reproducible Example。我没有Watson API密钥,因此无法完全复制您的示例,但is的基本功能如下:

extractEntities(url)中,您对Watson NLP服务进行API调用,并为响应中找到的每个实体创建一个具有相关性、情感等的字典。最后返回所有这些词典的列表。根据您提供的代码,让我们创建一个虚拟函数来模拟它,以便我可以尝试重现您遇到的问题

import random
import pandas as pd

def extractEntities(url):
  article_dict = [] # actually a list, not a dict!!
  for entity in ('Senate', 'CNN', 'Hillary Clinton', 'Bill Clinton'):
      initial_dict = {}
      initial_dict['entity'] = entity
      initial_dict['url'] = url
      initial_dict['source'] = url.split('.')[1]
      initial_dict['relevance'] = random.random()
      initial_dict['sentiment'] = random.random()
      article_dict.append(initial_dict)
  return article_dict # returns a list of dictionaries

示例输出是字典列表:

>>> extractEntities('https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html')
[{'entity': 'Senate',
  'relevance': 0.4000160139190754,
  'sentiment': 0.012884391182820587,
  'source': 'cnn',
  'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
 {'entity': 'CNN',
  'relevance': 0.44921272670354884,
  'sentiment': 0.40996636370319894,
  'source': 'cnn',
  'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
 {'entity': 'Hillary Clinton',
  'relevance': 0.4892046288027784,
  'sentiment': 0.5424038672663258,
  'source': 'cnn',
  'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
 {'entity': 'Bill Clinton',
  'relevance': 0.7237361288162582,
  'sentiment': 0.8269245953553733,
  'source': 'cnn',
  'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'}]

现在allurls3中有一个URL列表,并执行以下操作:

  • 您创建了一个名为dict1的空列表
  • allurls3中循环URL
  • 在该URL上调用extractEntitiesdata3现在拥有一个字典列表(见上文)
  • 将该词典列表附加到列表dict1。 最终结果dict1是字典列表:
    >>> allurls3 = ['https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html', 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305']
    >>>> dict1 = []
    >>>> for u in range(len(allurls3)):
    >>>      data3 = []
    >>>      url3 = allurls3[u]
    >>>      data3 = extractEntities(url3)
    >>>      dict1.append(data3)
    >>> dict1
    [[{'entity': 'Senate',
       'relevance': 0.19115763152061027,
       'sentiment': 0.557935869111337,
       'source': 'cnn',
       'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
      {'entity': 'CNN',
       'relevance': 0.9259134250004917,
       'sentiment': 0.8605677705216526,
       'source': 'cnn',
       'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
      {'entity': 'Hillary Clinton',
       'relevance': 0.6071084891165042,
       'sentiment': 0.04296592154310419,
       'source': 'cnn',
       'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
      {'entity': 'Bill Clinton',
       'relevance': 0.9558183603396242,
       'sentiment': 0.42813857092335783,
       'source': 'cnn',
       'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'}],
     [{'entity': 'Senate',
       'relevance': 0.5060582500660554,
       'sentiment': 0.9240451580369043,
       'source': 'wsj',
       'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'},
      {'entity': 'CNN',
       'relevance': 0.03956002547473547,
       'sentiment': 0.5337343576461046,
       'source': 'wsj',
       'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'},
      {'entity': 'Hillary Clinton',
       'relevance': 0.6706912125534789,
       'sentiment': 0.7721987482202004,
       'source': 'wsj',
       'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'},
      {'entity': 'Bill Clinton',
       'relevance': 0.37377943134631464,
       'sentiment': 0.7114485187747178,
       'source': 'wsj',
       'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'}]]

最后,将这个字典列表dict1包装到另一个列表中,将其转换为一个数据帧

>>> pd.set_option('max_colwidth', 800)
>>> articles_df1 = pd.DataFrame([dict1])
>>> articles_df1

enter image description here

好的,现在我已经能够重现你的错误,我可以告诉你如何修复它。从您发布的第一张图片中可以看出,您需要向pd.DataFrame提供字典列表,而不是像现在这样提供字典列表列表

此外,命名一个列表dict1非常令人困惑。因此,请执行以下操作。关键的区别是使用^{} instead of ^{}

>>> entities = []
>>> for url3 in allurls3:
>>>     data3 = extractEntities(url3)
>>>     entities.extend(data3)
>>> entities
[{'entity': 'Senate',
  'relevance': 0.11594421982738612,
  'sentiment': 0.2917557430217993,
  'source': 'cnn',
  'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
 {'entity': 'CNN',
  'relevance': 0.5741596155387597,
  'sentiment': 0.7743716765722405,
  'source': 'cnn',
  'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
 {'entity': 'Hillary Clinton',
  'relevance': 0.2535272395046557,
  'sentiment': 0.2570270764910251,
  'source': 'cnn',
  'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
 {'entity': 'Bill Clinton',
  'relevance': 0.2275111369786037,
  'sentiment': 0.03312536097047081,
  'source': 'cnn',
  'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
 {'entity': 'Senate',
  'relevance': 0.8197309413723833,
  'sentiment': 0.9492436947284604,
  'source': 'wsj',
  'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'},
 {'entity': 'CNN',
  'relevance': 0.7317312596198684,
  'sentiment': 0.5052344447199512,
  'source': 'wsj',
  'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'},
 {'entity': 'Hillary Clinton',
  'relevance': 0.3572239446181651,
  'sentiment': 0.056131606725058014,
  'source': 'wsj',
  'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'},
 {'entity': 'Bill Clinton',
  'relevance': 0.761777835912902,
  'sentiment': 0.28138007550393573,
  'source': 'wsj',
  'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'}]

现在,您有了一个字典列表,可用于创建数据帧:

>>> pd.set_option('max_colwidth', 800)
>>> articles_df1 = pd.DataFrame(entities)
>>> articles_df1

enter image description here

相关问题 更多 >