python，将数据列表转换为datafram问题的回答

python，将数据列表转换为datafram

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

另一种方法是不使用regex（但不如<a href="https://stackoverflow.com/a/51221288/7505395">Romans answer</a>整洁），使用列表理解清理数据，然后放入dict中，从中创建数据帧： <pre><code>data = ['2018\t \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192', '2016\t \t15,944\t9,971\t5,973', '2015\t \t15,071\t9,079\t5,992', '2014\t \t14,415\t8,596\t5,819', '2013\t \t14,259\t8,269\t5,990', '2012\t \t14,010\t8,143\t5,867', '2011\t \t14,149\t8,126\t6,023', '2010\t \t14,505\t7,943\t6,562', '2009\t \t14,632\t8,022\t6,610', '2008\t \t14,207\t7,989\t6,218', '2007\t \t14,400\t8,085\t6,315', '2006\t \t14,750\t8,017\t6,733', '2005\t \t14,497\t7,593\t6,904', '2004\t \t14,155\t7,150\t7,005', '2003\t \t13,285\t6,457\t6,828', '2002\t \t12,821\t6,190\t6,631', '2001\t \t12,702\t6,080\t6,622', '2000\t \t11,942\t5,985\t5,957', '1999\t \t10,872\t5,824\t5,048', '1998\t \t10,362\t5,793\t4,569', '1997\t \t9,546\t5,479\t4,067', '1996\t \t9,222\t5,418\t3,804', '1995\t \t8,859\t5,363\t3,496', '1994\t \t8,203\t5,099\t3,104', '1993\t \t7,766\t4,861\t2,905', '1992\t \t7,091\t4,520\t2,571', '1991\t \t6,953\t4,526\t2,427', '1990\t \t6,632\t4,509\t2,123', '1989\t \t5,929\t4,011\t1,918', '1988\t \t5,909\t4,080\t1,829'] # partition and clean the data cleaned = [ [x.strip() for x in year.split("\t") if x.strip()] for year in data ] # make a dict dataCleaned = {x:y for x,*y in cleaned} print (dataCleaned) import pandas as pd df = pd.DataFrame(dataCleaned) print(df) </code></pre> 输出： <pre><code># the dict {'2018': ['7,107', '4,394', '2,713'], '2017': ['16,478', '10,286', '6,192'], '2016': ['15,944', '9,971', '5,973'], '2015': ['15,071', '9,079', '5,992'], '2014': ['14,415', '8,596', '5,819'], '2013': ['14,259', '8,269', '5,990'], '2012': ['14,010', '8,143', '5,867'], '2011': ['14,149', '8,126', '6,023'], '2010': ['14,505', '7,943', '6,562'], '2009': ['14,632', '8,022', '6,610'], '2008': ['14,207', '7,989', '6,218'], '2007': ['14,400', '8,085', '6,315'], '2006': ['14,750', '8,017', '6,733'], '2005': ['14,497', '7,593', '6,904'], '2004': ['14,155', '7,150', '7,005'], '2003': ['13,285', '6,457', '6,828'], '2002': ['12,821', '6,190', '6,631'], '2001': ['12,702', '6,080', '6,622'], '2000': ['11,942', '5,985', '5,957'], '1999': ['10,872', '5,824', '5,048'], '1998': ['10,362', '5,793', '4,569'], '1997': ['9,546', '5,479', '4,067'], '1996': ['9,222', '5,418', '3,804'], '1995': ['8,859', '5,363', '3,496'], '1994': ['8,203', '5,099', '3,104'], '1993': ['7,766', '4,861', '2,905'], '1992': ['7,091', '4,520', '2,571'], '1991': ['6,953', '4,526', '2,427'], '1990': ['6,632', '4,509', '2,123'], '1989': ['5,929', '4,011', '1,918'], '1988': ['5,909', '4,080', '1,829'] </code></pre> } <pre><code># the dataframe 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 \ 0 5,909 5,929 6,632 6,953 7,091 7,766 8,203 8,859 9,222 9,546 1 4,080 4,011 4,509 4,526 4,520 4,861 5,099 5,363 5,418 5,479 2 1,829 1,918 2,123 2,427 2,571 2,905 3,104 3,496 3,804 4,067 ... 2009 2010 2011 2012 2013 2014 2015 2016 \ 0 ... 14,632 14,505 14,149 14,010 14,259 14,415 15,071 15,944 1 ... 8,022 7,943 8,126 8,143 8,269 8,596 9,079 9,971 2 ... 6,610 6,562 6,023 5,867 5,990 5,819 5,992 5,973 2017 2018 0 16,478 7,107 1 10,286 4,394 2 6,192 2,713 [3 rows x 31 columns] </code></pre> <hr/> 编辑后： <pre><code>import pandas as pd data = ['2018\t \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192', '2016\t \t15,944\t9,971\t5,973', '2015\t \t15,071\t9,079\t5,992', '2014\t \t14,415\t8,596\t5,819', '2013\t \t14,259\t8,269\t5,990', '2012\t \t14,010\t8,143\t5,867', '2011\t \t14,149\t8,126\t6,023', '2010\t \t14,505\t7,943\t6,562', '2009\t \t14,632\t8,022\t6,610', '2008\t \t14,207\t7,989\t6,218', '2007\t \t14,400\t8,085\t6,315', '2006\t \t14,750\t8,017\t6,733', '2005\t \t14,497\t7,593\t6,904', '2004\t \t14,155\t7,150\t7,005', '2003\t \t13,285\t6,457\t6,828', '2002\t \t12,821\t6,190\t6,631', '2001\t \t12,702\t6,080\t6,622', '2000\t \t11,942\t5,985\t5,957', '1999\t \t10,872\t5,824\t5,048', '2018\t \t10,362\t5,793\t4,569', '2017\t \t9,546\t5,479\t4,067', '2016\t \t9,222\t5,418\t3,804', '2015\t \t8,859\t5,363\t3,496', '2014\t \t8,203\t5,099\t3,104', '2013\t \t7,766\t4,861\t2,905', '2012\t \t7,091\t4,520\t2,571', '2011\t \t6,953\t4,526\t2,427', '2010\t \t6,632\t4,509\t2,123', '2009\t \t5,929\t4,011\t1,918', '2008\t \t5,909\t4,080\t1,829'] # partition and clean the data cleaned = [ [x.strip() for x in year.split("\t") if x.strip()] for year in data ] import pandas as pd df = pd.DataFrame(cleaned,columns=['year', 'data1', 'data2', 'data3']) print(df) </code></pre> 编辑后输出： <pre><code> year data1 data2 data3 0 2018 7,107 4,394 2,713 1 2017 16,478 10,286 6,192 2 2016 15,944 9,971 5,973 3 2015 15,071 9,079 5,992 4 2014 14,415 8,596 5,819 5 2013 14,259 8,269 5,990 6 2012 14,010 8,143 5,867 7 2011 14,149 8,126 6,023 8 2010 14,505 7,943 6,562 9 2009 14,632 8,022 6,610 10 2008 14,207 7,989 6,218 11 2007 14,400 8,085 6,315 12 2006 14,750 8,017 6,733 13 2005 14,497 7,593 6,904 14 2004 14,155 7,150 7,005 15 2003 13,285 6,457 6,828 16 2002 12,821 6,190 6,631 17 2001 12,702 6,080 6,622 18 2000 11,942 5,985 5,957 19 1999 10,872 5,824 5,048 20 2018 10,362 5,793 4,569 21 2017 9,546 5,479 4,067 22 2016 9,222 5,418 3,804 23 2015 8,859 5,363 3,496 24 2014 8,203 5,099 3,104 25 2013 7,766 4,861 2,905 26 2012 7,091 4,520 2,571 27 2011 6,953 4,526 2,427 28 2010 6,632 4,509 2,123 29 2009 5,929 4,011 1,918 30 2008 5,909 4,080 1,829 </code></pre> <hr/> 编辑： <pre><code>cleaned = [ [x.strip() for x in year.split("\t") if x.strip()] for year in data ] </code></pre> 与以下内容大致相同： <pre><code>alsoCleaned = [] for year in data: part = [] # collect all parts of one string for x in year.split("\t"): # split the one string partCleaned = x.strip() # remove whitespaces from x if partCleaned : # only if now got content part.append(partCleaned) # add to part alsoCleaned.append(part) # done all parts so add to big list part = [] print(alsoCleaned) </code></pre> ==&gt <pre><code>[['2018', '7,107', '4,394', '2,713'], ['2017', '16,478', '10,286', '6,192'], # .... and so on ...., ['2008', '5,909', '4,080', '1,829']] </code></pre>

python，将数据列表转换为datafram

1 个回答

相关Python问题