我有一个SAP生成的文件,其中有许多列和一些不需要的行。我该如何直接读懂Pandas呢?

2024-09-28 05:28:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我的桌子:

Table To Be Searched MSEG
Number of hits                                                            273208
Maximum No. of Entri                                                           0
Runtime                00:24:17

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|Mat. Doc. |MatYr|MvT|Material |Plnt|SLoc|Batch     |Customer|  Amount in LC|        Amount|    Quantity|BUn|    Qty in UnE|EUn|PO        |MatYr|Mat. Doc. |Order    |Profit Ctr|SLED/BBD  |Pstng Date|Entry Date|Time    |User name  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|4912693062|2015 |551|100062   |HDC2|0001|5G30MC1A11|        |         9.03 |         9.06 |      0.083 |CS |            2 |EA |          |     |          |         |IN1165B085|26.01.2016|01.08.2015|01.08.2015|01:13:16|O33462     |
|4912693063|2015 |501|166      |HDC2|0004|          |        |         0.00 |         0.00 |          2 |EA |            2 |EA |          |     |          |         |IN1165B085|          |01.08.2015|01.08.2015|01:13:17|O33462     |
|4912693320|2015 |551|101343   |HDC2|0001|5G28MC1A11|        |        53.73 |        53.72 |      0.500 |CS |           12 |EA |          |     |          |         |IN1165B085|25.01.2016|01.08.2015|01.08.2015|01:16:30|O33462     |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table To Be Searched MSEG
Number of hits                                                            273208
Maximum No. of Entri                                                           0
Runtime                00:24:17

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|Mat. Doc. |MatYr|MvT|Material |Plnt|SLoc|Batch     |Customer|  Amount in LC|        Amount|    Quantity|BUn|    Qty in UnE|EUn|PO        |MatYr|Mat. Doc. |Order    |Profit Ctr|SLED/BBD  |Pstng Date|Entry Date|Time    |User name  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|4912696602|2015 |344|100399   |HMH3|0001|5G29MH3S11|        |         0.00 |         0.00 |      9,061 |CS |        9,061 |CS |          |     |          |         |IN1165B074|26.01.2016|01.08.2015|01.08.2015|01:54:15|A70475     |

它包含超过100万行。以前,我使用以下Python代码首先在CSV中转换此文件:

^{pr2}$

有没有更好的方法将这种文件转换成CSV格式,或者我应该如何直接将其解析为pandaps?在


Tags: oftoindatedoctablebecs
1条回答
网友
1楼 · 发布于 2024-09-28 05:28:22

可以使用^{}只过滤表数据,使用csv.reader来解析重要行,如下所示:

import csv
import itertools
import StringIO

with open('input.txt', 'rb') as f_input:
    for line in itertools.ifilter(lambda x: len(x) > 2 and x[0] == '|' and x[1].isalpha(), f_input):
        header = [cols.strip() for cols in next(csv.reader(StringIO.StringIO(line), delimiter='|', skipinitialspace=True))][1:-1]
        break

with open('input.txt', 'rb') as f_input, open('output.csv', 'wb') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(header)
    for line in itertools.ifilter(lambda x: len(x) > 2 and x[0] == '|' and x[1] != '-' and not x[1].isalpha(), f_input):
        csv_input = csv.reader(StringIO.StringIO(line), delimiter='|', skipinitialspace=True)
        csv_output.writerow([col.strip().translate(None, ",!.;") for col in next(csv_input)[1:-1]])

这将为您提供如下输出csv文件:

^{pr2}$

相关问题 更多 >

    热门问题