在python中将名为月份的嵌套列表转换或格式化为新列表

2024-09-30 18:17:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个嵌套列表,如下所示:

data = [[[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['tiktok', 'tenaga kesehatan'], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [['kanker'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['jantung'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['jantung'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19', 'covid-19'], 'October'],
 [['covid-19'], 'October'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [[], 'September'],
 [['covid-19', 'covid-19'], 'September'],
 [['jantung'], 'September'],
 [['jantung'], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [[], 'August'],
 [[], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['jantung'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19', 'covid-19'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'July']]

我想用月份的名称来计算所有的代币('covid-19','JANTONG'…等),这样我就可以得到每月的代币频率

以下是我的预期产出:

result = [
    ['covid-19',0,0,0,0,0,0,1,19,17,21,0,0],
    ['tiktok',0,0,0,0,0,0,0,0,0,1,0,0],
    ['jantung',0,0,0,0,0,0,0,1,2,2,0,0],
    ['kanker',0,0,0,0,0,0,0,0,0,1,0,0],
    ['tenaga kesehatan',0,0,0,0,0,0,0,0,0,1,0,0],   
]

请注意:'0,0,0,0,0,0,1,19,17,21,0,0'是从一月到十二月的顺序和该月的标记的总和。请建议我一种将嵌套的标记转换为结果列表的方法

有什么想法吗


Tags: 标记列表datajuly月份august代币october
3条回答

这里我们提供了一个可能的解决方案:

import calendar

data = [[[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['tiktok', 'tenaga kesehatan'], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [['kanker'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['jantung'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['jantung'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19', 'covid-19'], 'October'],
 [['covid-19'], 'October'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [[], 'September'],
 [['covid-19', 'covid-19'], 'September'],
 [['jantung'], 'September'],
 [['jantung'], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [[], 'August'],
 [[], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['jantung'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19', 'covid-19'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'July']]

final = []
for el in data:
    if len(el[0]) > 0:
        for key in el[0]:
            if key not in [sub[0] for sub in final]:
                final.append([key] + [0]*12)
            for sub in final:
                if sub[0] == key:
                    sub[list(calendar.month_abbr).index(el[-1][:3])] += 1

print(final)

输出将是:

[['covid-19', 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0, 0], ['tiktok', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['tenaga kesehatan', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['kanker', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['jantung', 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0]]

注意:不过,正如有人提到的,使用不同的数据结构来存储结果可能是个好主意。当然,一本字典会更方便,也会让你写出一个更线性的解决方案

虽然其他人写了非常好的答案,但我觉得通过pandas解决这个问题更容易维护,也更冗长。加上熊猫的对象真的很容易处理

首先是进口:

import pandas as pd
import calendar
from pprint import pprint

以下是代码的主体:

df = pd.DataFrame(data, columns=["lists", "month"])
names = list(set([y for x in df["lists"] for y in x]))
df[names] = 0


def func(row):
    for n in names:
        for k in row["lists"]:
            if k == n:
                row[n] += 1
    return row


df = df.apply(func, axis=1)
df.drop(["lists"], inplace=True, axis=1)

new_df = df.groupby(by="month").sum().T.reset_index()
new_df.columns.name = None # Just for my taste to remove the "month" label of groupby result

months = list(calendar.month_name)[1:]  # list of months. There's an empty string at index 0.
new_df[[m for m in months if m not in new_df.columns]] = 0 #Creating columns for unseen months
new_df = new_df[["index"] + months] #sorting the months
print(new_df) 
pprint(new_df.values.tolist())

输出将是:

              index  January  February  ...  October  November  December
0            kanker        0         0  ...        1         0         0
1          covid-19        0         0  ...       19         0         0
2           jantung        0         0  ...        2         0         0
3            tiktok        0         0  ...        1         0         0
4  tenaga kesehatan        0         0  ...        1         0         0

[5 rows x 13 columns]


[['kanker', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
 ['covid-19', 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0, 0],
 ['jantung', 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0],
 ['tiktok', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
 ['tenaga kesehatan', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]]

产出将是:

              index  January  February  ...  October  November  December
0  tenaga kesehatan        0         0  ...        1         0         0
1          covid-19        0         0  ...       19         0         0
2            kanker        0         0  ...        1         0         0
3           jantung        0         0  ...        2         0         0
4            tiktok        0         0  ...        1         0         0

[5 rows x 13 columns]


[['tenaga kesehatan', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
 ['covid-19', 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0, 0],
 ['kanker', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
 ['jantung', 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0],
 ['tiktok', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]]


您真的不应该在这样的列表中存储不同的数据,这样的列表怎么样

{'covid-19': [0, 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0],
 'jantung': [0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0],
 'kanker': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
 'tenaga kesehatan': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
 'tiktok': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]}

下面是一段代码片段,用于编写此命令:

from collections import defaultdict
result = defaultdict(lambda: [0]*12)
for i in data: 
    if i[0]: 
        for j in i[0]: 
            result[j][datetime.datetime.strptime(i[1],"%B").month - 1] += 1

相关问题 更多 >