如何将一列数据框中的文本拆分为所需格式(三列数据框)

2024-09-29 19:09:52 发布

您现在位置:Python中文网/ 问答频道 /正文

给定数据,我需要分成三列,分别为NameDateTypedata frames中:

数据:

ANNAPOLIS INDUSTRIAL LOAN CO - Aug-2002 - Non-Procurable Miscellaneous Non-Procurable Royalties Royalties

PERRY & CO - Apr-2016 - Non-Procurable Miscellaneous Non-Procurable Royalties Royalties

ASSOCIATED BANC-CORP - Jun-2008 - Corporate Services Human Resources Contingent Labor/Temp Labor Contingent Labor/Temp Labor

L-3 COMMUNICATIONS TITAN CORP - Dec-2014 - Store Construction General Contractor General Requirements Final Site Clean Up

AMERACE CORP 1967 QUAL STK OPT PL & 1972 QUAL-NON-QUAL STK O - Jun-2002 - Store Construction Fixtures Store Fixtures Store Fixtures

ASSOCIATED BANC-CORP - Jun-2008 - Corporate Services Human Resources Contingent Labor/Temp Labor Contingent Labor/Temp Labor

AETNA VARIABLE FUND - Apr-2002 - Store Management Real Estate Real Estate Services Real Estate General (Search, Appraisal, Realtor Commission)


FAIRCHILD CORP - Nov-2001 - Store Management Real Estate Real Estate Services Real Estate General (Search, Appraisal, Realtor Commission

CALIFORNIA REAL ESTATE INVESTMENT TRUST - Mar-2013 - Store Management Real Estate Real Estate Services Real Estate General (Search, Appraisal, Realtor Commission)

EDO CORP - Jul-2008 - Store Management Real Estate Real Estate Services Real Estate General (Search, Appraisal, Realtor Commission)

如何使用regex将数据转换为三列

我刚刚开始学习正则表达式,所以我不知道如何继续完成相同的


Tags: storesearchrealmanagementtempgeneralnoncorp
1条回答
网友
1楼 · 发布于 2024-09-29 19:09:52

使用此模式可以很容易地找到日期:-\s([A-Z][a-z]{2}-[0-9]{4})\s-

然后,您只需要从date模式中选择句子的开头和结尾就可以得到namestypes

下面是代码(使用re模块):

# Import module
import re
# Read file
with open("temp.txt") as f:
    text = f.read()

# Apply regex rules
names = re.findall(r"(.*?)-\s[A-Z][a-z]{2}-[0-9]{4}\s-", text)
dates = re.findall(r"-\s([A-Z][a-z]{2}-[0-9]{4})\s-", text)
types = re.findall(r"-\s[A-Z][a-z]{2}-[0-9]{4}\s-([^\n]*)", text)

# Create dataframes
df = pd.DataFrame({"Name": names,
                    "Date": dates,
                    "Type": types})

print(df)
#                                                Name      Date                                               Type
# 0                      ANNAPOLIS INDUSTRIAL LOAN CO   Aug-2002   Non-Procurable Miscellaneous Non-Procurable R...
# 1                                        PERRY & CO   Apr-2016   Non-Procurable Miscellaneous Non-Procurable R...
# 2                              ASSOCIATED BANC-CORP   Jun-2008   Corporate Services Human Resources Contingent...
# 3                     L-3 COMMUNICATIONS TITAN CORP   Dec-2014   Store Construction General Contractor General...
# 4  AMERACE CORP 1967 QUAL STK OPT PL & 1972 QUAL-...  Jun-2002   Store Construction Fixtures Store Fixtures St...
# 5                              ASSOCIATED BANC-CORP   Jun-2008   Corporate Services Human Resources Contingent...
# 6                               AETNA VARIABLE FUND   Apr-2002   Store Management Real Estate Real Estate Serv...
# 7                                    FAIRCHILD CORP   Nov-2001   Store Management Real Estate Real Estate Serv...
# 8           CALIFORNIA REAL ESTATE INVESTMENT TRUST   Mar-2013   Store Management Real Estate Real Estate Serv...
# 9                                          EDO CORP   Jul-2008   Store Management Real Estate Real Estate Serv...

相关问题 更多 >

    热门问题