在python中将名为月份的嵌套列表转换或格式化为新列表

网友
1楼 · 编辑于 2024-09-30 18:17:16

这里我们提供了一个可能的解决方案：
import calendar data = [[[], 'October'], [[], 'October'], [[], 'October'], [['covid-19'], 'October'], [['covid-19'], 'October'], [[], 'October'], [['covid-19'], 'October'], [[], 'October'], [['tiktok', 'tenaga kesehatan'], 'October'], [[], 'October'], [['covid-19'], 'October'], [['kanker'], 'October'], [['covid-19'], 'October'], [[], 'October'], [[], 'October'], [['covid-19'], 'October'], [[], 'October'], [['jantung'], 'October'], [['covid-19'], 'October'], [[], 'October'], [[], 'October'], [[], 'October'], [[], 'October'], [[], 'October'], [[], 'October'], [[], 'October'], [[], 'October'], [[], 'October'], [['covid-19'], 'October'], [['covid-19'], 'October'], [['covid-19'], 'October'], [[], 'October'], [[], 'October'], [[], 'October'], [[], 'October'], [['covid-19'], 'October'], [[], 'October'], [['jantung'], 'October'], [['covid-19'], 'October'], [['covid-19'], 'October'], [['covid-19'], 'October'], [['covid-19'], 'October'], [['covid-19'], 'October'], [['covid-19', 'covid-19'], 'October'], [['covid-19'], 'October'], [[], 'September'], [['covid-19'], 'September'], [['covid-19'], 'September'], [[], 'September'], [[], 'September'], [['covid-19', 'covid-19'], 'September'], [['jantung'], 'September'], [['jantung'], 'September'], [['covid-19'], 'September'], [['covid-19'], 'September'], [['covid-19'], 'September'], [[], 'September'], [['covid-19'], 'September'], [[], 'September'], [['covid-19'], 'September'], [[], 'September'], [['covid-19'], 'September'], [['covid-19'], 'September'], [[], 'September'], [['covid-19'], 'September'], [[], 'September'], [['covid-19'], 'September'], [['covid-19'], 'September'], [[], 'September'], [[], 'September'], [['covid-19'], 'September'], [[], 'September'], [[], 'August'], [[], 'August'], [[], 'August'], [['covid-19'], 'August'], [[], 'August'], [[], 'August'], [['covid-19'], 'August'], [['jantung'], 'August'], [['covid-19'], 'August'], [['covid-19'], 'August'], [[], 'August'], [['covid-19'], 'August'], [['covid-19'], 'August'], [['covid-19'], 'August'], [['covid-19'], 'August'], [[], 'August'], [['covid-19'], 'August'], [[], 'August'], [['covid-19'], 'August'], [['covid-19'], 'August'], [[], 'August'], [['covid-19'], 'August'], [['covid-19'], 'August'], [[], 'August'], [['covid-19'], 'August'], [['covid-19', 'covid-19'], 'August'], [['covid-19'], 'August'], [['covid-19'], 'July']] final = [] for el in data: if len(el[0]) > 0: for key in el[0]: if key not in [sub[0] for sub in final]: final.append([key] + [0]*12) for sub in final: if sub[0] == key: sub[list(calendar.month_abbr).index(el[-1][:3])] += 1 print(final)
输出将是：
[['covid-19', 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0, 0], ['tiktok', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['tenaga kesehatan', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['kanker', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['jantung', 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0]]
注意：不过，正如有人提到的，使用不同的数据结构来存储结果可能是个好主意。当然，一本字典会更方便，也会让你写出一个更线性的解决方案

网友
2楼 · 编辑于 2024-09-30 18:17:16

虽然其他人写了非常好的答案，但我觉得通过pandas解决这个问题更容易维护，也更冗长。加上熊猫的对象真的很容易处理
首先是进口：
import pandas as pd import calendar from pprint import pprint
以下是代码的主体：
df = pd.DataFrame(data, columns=["lists", "month"]) names = list(set([y for x in df["lists"] for y in x])) df[names] = 0 def func(row): for n in names: for k in row["lists"]: if k == n: row[n] += 1 return row df = df.apply(func, axis=1) df.drop(["lists"], inplace=True, axis=1) new_df = df.groupby(by="month").sum().T.reset_index() new_df.columns.name = None # Just for my taste to remove the "month" label of groupby result months = list(calendar.month_name)[1:] # list of months. There's an empty string at index 0. new_df[[m for m in months if m not in new_df.columns]] = 0 #Creating columns for unseen months new_df = new_df[["index"] + months] #sorting the months print(new_df) pprint(new_df.values.tolist())
输出将是：
index January February ... October November December 0 kanker 0 0 ... 1 0 0 1 covid-19 0 0 ... 19 0 0 2 jantung 0 0 ... 2 0 0 3 tiktok 0 0 ... 1 0 0 4 tenaga kesehatan 0 0 ... 1 0 0 [5 rows x 13 columns] [['kanker', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['covid-19', 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0, 0], ['jantung', 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0], ['tiktok', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['tenaga kesehatan', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]]
产出将是：
index January February ... October November December 0 tenaga kesehatan 0 0 ... 1 0 0 1 covid-19 0 0 ... 19 0 0 2 kanker 0 0 ... 1 0 0 3 jantung 0 0 ... 2 0 0 4 tiktok 0 0 ... 1 0 0 [5 rows x 13 columns] [['tenaga kesehatan', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['covid-19', 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0, 0], ['kanker', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['jantung', 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0], ['tiktok', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]]

网友
3楼 · 编辑于 2024-09-30 18:17:16

您真的不应该在这样的列表中存储不同的数据，这样的列表怎么样

{'covid-19': [0, 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0],
 'jantung': [0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0],
 'kanker': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
 'tenaga kesehatan': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
 'tiktok': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]}

下面是一段代码片段，用于编写此命令：

from collections import defaultdict
result = defaultdict(lambda: [0]*12)
for i in data: 
    if i[0]: 
        for j in i[0]: 
            result[j][datetime.datetime.strptime(i[1],"%B").month - 1] += 1

相关问题更多 >

编程相关推荐

热门问题

热门文章