使用关键字将Python列表解析为pandas.DataFrame

2024-09-29 23:28:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个国家的清单,应该变成一个数据框架。问题是每个国家和数据在列表中都是一个单独的词。例如:

[
 'Viet',
 'Nam',
 '0',
 '12.3',
 '0',
 'Brunei',
 'Darussalam',
 '12',
 '1.1',
 '0',
 'Bosnia',
 'and',
 'Herzegovina',
 '2',
 '2.1',
 '0',
 'Not',
 'applicable',
 'Turkey',
 '4',
 '4.3',
 '0',
 'Only',
 'partial',
 'coverage'
...
]

如何将其转换为: [ [‘越南’、‘0’、‘12.3’、‘0’], [‘文莱达鲁萨兰国’、‘12’、‘1.1’、…], ... ] 或“pd.DataFrame”:

             country  coef1  coef2  grade
0           Viet Nam      0   12.3      0
1  Brunei Darussalam     12    1.1      0

注意:有些国家有一个单词,如中国、法国或三个或更多单词,如大韩民国。此外,有时在这一系列数字之后会有一个备注


Tags: and数据框架列表not国家单词applicable
1条回答
网友
1楼 · 发布于 2024-09-29 23:28:18

试试这个:

其中,中的数据是您要分析的数据,国家是世界上所有国家的列表

import pandas as pd
import re

countries = ["Afghanistan", "Albania", "Algeria", "Andorra", "Angola", "Antigua and Barbuda", "Argentina", "Armenia" ...]

data_in = [
    'Viet', 'Nam', '0', '12.3', '0', 'Brunei', 'Darussalam', '12', '1.1', '0', 'Bosnia', 'and', 'Herzegovina', '2', '2.1', '0', 'Not', 'applicable', 'Turkey', '4', '4.3', '0'
]

data_out = []

country = coef1 = coef2 = grade = []

def is_country(elem):
  isCountry = False
  for country in countries:
    if elem.lower() in country.lower():
      isCountry = True
      break
  return isCountry

def is_num(elem):
  if re.search(r'\d', elem) is not None:
    return True
  else:
    return False

idx = 0
while idx < (len(data_in)):
  elem = data_in[idx]
  country = ''
  elements = []
  is_country_name = False
  data_out_local = []
  if is_country(elem):
    #
    while (not is_num(elem) and idx < len(data_in)):
      country += elem + " "
      idx += 1
      elem = data_in[idx]
    while(is_num(elem) and idx < len(data_in)):
      elements.append(elem)
      idx += 1
      if idx < len(data_in):
        elem = data_in[idx]
    data_out_local.append(country)
    data_out_local.extend(elements)
    data_out.append(data_out_local)
  idx += 1


df = pd.DataFrame(data_out, columns=['country', 'coef1', 'coef1', 'grade'])
print(df)

.DataFrame输出:

                   country coef1 coef1 grade
0                Viet Nam      0  12.3     0
1  Bosnia and Herzegovina      2   2.1     0
2                  Turkey      4   4.3     0

Nonstandard solution, but it works

相关问题 更多 >

    热门问题