复制并粘贴部分tabtype数据以创建新文档

1CURRENT DATE: XXX AGE,SEX, RACE AND ETHNICITY OF PERSONS PAGE 1 BEGINNING DATE FOR DATA TOTALS: 01/83 COUNTY 001 ENDING DATE FOR DATA TOTALS: 12/83 RECORD COUNT 36 Gender Age_20 Age_21 Age_22 Age_23 Asian Hispanic White Robbery F 1 2 2 2 3 3 3 M 3 3 2 2 4 3 3 Fraud F 1 2 2 2 3 3 2 M 2 3 2 2 4 3 3 Arson F 1 2 2 2 3 3 3 M 4 3 2 2 4 3 4 1CURRENT DATE: XXX AGE,SEX, RACE AND ETHNICITY OF PERSONS PAGE 4 BEGINNING DATE FOR DATA TOTALS: 01/83 COUNTY 002 ENDING DATE FOR DATA TOTALS: 12/83 RECORD COUNT 36 Gender Age_20 Age_21 Age_22 Age_23 Asian Hispanic White Robbery F 1 2 2 2 3 3 3 M 2 3 2 2 4 4 3 Fraud F 1 2 2 2 3 3 2 M 2 3 2 2 4 6 3 Arson F 1 2 2 2 3 3 3 M 4 3 2 2 4 3 4 1CURRENT DATE: XXX AGE,SEX, RACE AND ETHNICITY OF PERSONS PAGE 7 BEGINNING DATE FOR DATA TOTALS: 01/83 COUNTY 003 ENDING DATE FOR DATA TOTALS: 12/83 RECORD COUNT 36 Gender Age_20 Age_21 Age_22 Age_23 Asian Hispanic White Robbery F 1 2 2 2 3 3 3 M 3 3 2 2 4 3 3 Fraud F 1 2 1 4 3 3 2 M 2 3 2 2 4 3 3 Arson F 1 2 4 2 3 3 3 M 4 3 2 2 4 3 4

Gender Age_20 Age_21 Age_22 Age_23 Asian Hispanic White County Robbery F 1 2 2 2 3 2 3 001 Robbery F 1 2 2 2 2 3 3 002 Robbery F 1 2 2 2 3 3 3 003

1条回答

网友

1楼 · 发布于 2024-09-28 20:46:06

你可能想看看pandas。具体细节会因格式的不同而有所不同，但将数据转换成更干净的格式并不需要花费太多时间。有更漂亮、更少硬编码的方法可以做到以下几点，但这里有一个几乎是意识流的例子：

import pandas as pd

# read in a fixed-width file
df = pd.read_fwf("crime.tsv", widths=[14] + [10]*8, header=None)
# clean up the strings
df = df.applymap(lambda x: x.strip() if isinstance(x, basestring) else x)

# make a new column
df["County"] = None
# move over the county information
df["County"][df[5] == "COUNTY"] = df[6]
# fill the county info forwards into the empty places
df["County"].fillna(method='ffill', inplace=True)

# fill the crime information forwards
df[0].fillna(method='ffill', inplace=True)

# reset the columns from one of the examples
df.columns = ["Crime"] + list(df.ix[3,1:-1]) + ["County"]
# get rid of any of the headings left in the table
df = df[~(df["Gender"] == "Gender")]

# toss anything which still has empty cells
df = df.dropna()

# reset the index, and fix the types
df = df.set_index(["Crime", "Gender", "County"]).astype(int)
df = df.reset_index()

产生

>>> df
      Crime Gender County  Age_20  Age_21  Age_22  Age_23  Asian  Hispanic  White
0   Robbery      F    001       1       2       2       2      3         3      3
1   Robbery      M    001       3       3       2       2      4         3      3
2     Fraud      F    001       1       2       2       2      3         3      2
3     Fraud      M    001       2       3       2       2      4         3      3
4     Arson      F    001       1       2       2       2      3         3      3
5     Arson      M    001       4       3       2       2      4         3      4
6   Robbery      F    002       1       2       2       2      3         3      3
7   Robbery      M    002       2       3       2       2      4         4      3
8     Fraud      F    002       1       2       2       2      3         3      2
9     Fraud      M    002       2       3       2       2      4         6      3
10    Arson      F    002       1       2       2       2      3         3      3
11    Arson      M    002       4       3       2       2      4         3      4
12  Robbery      F    003       1       2       2       2      3         3      3
13  Robbery      M    003       3       3       2       2      4         3      3
14    Fraud      F    003       1       2       1       4      3         3      2
15    Fraud      M    003       2       3       2       2      4         3      3
16    Arson      F    003       1       2       4       2      3         3      3
17    Arson      M    003       4       3       2       2      4         3      4

之后我们可以做各种整洁的事情。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章