从Excelfile匹配Python中的正则表达式问题的回答

从Excelfile匹配Python中的正则表达式

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在使用正则表达式来匹配下面的excel文件，我正在努力解决如何把每一行分开 时间戳<code>[0:00:48]</code>， ID<code>20052A</code> 和内容{<cd3>} 这是excel行（许多行中的一行，因此ID可能因行而异，时间戳以及内容也可能不同） <code>[0:00:48] 20052A: content (more content)</code> 我得到一个错误代码 <code>AttributeError: 'NoneType' object has no attribute 'group</code> 用于匹配我的身份证 <code>(r"^(.+:)(.+)|(r(\w+)?\s*\[(.*)\]\s*(\w+))", c)</code> 请记住，有时ID看起来像这样 <code>[0:00:33] 30091aA: (content) </code> 我的全部任务是（取消与数据库的连接） <pre><code>import os import re import pymysql pymysql.install_as_MySQLdb() import pandas as pd import sqlalchemy def insert_or_update(engine, pd_table, table_name): inserts = 0 updates = 0 for i in range(len(pd_table)): vals_with_quotes = ["'" + str(x) + "'" for x in pd_table.loc[i, :].values] # print(vals_with_quotes) update_pairs = [str(c) + " = '" + str(v) + "'" for c, v in zip(pd_table.columns, pd_table.loc[i, :])] query = f"INSERT INTO {table_name} ({', '.join(list(pd_table.columns.values))}) " \ f"VALUES ({', '.join(vals_with_quotes)}) " \ f"ON DUPLICATE KEY UPDATE {', '.join(update_pairs)}" print(query) result = engine.execute(query) if result.lastrowid == 0: updates += 1 else: inserts += 1 print(f"Inserted {inserts} rows and updated {updates} rows.") schema = '---' alchemy_connect = "---" engine = sqlalchemy.create_engine(alchemy_connect) # connect to server engine.execute(f"USE {schema}") # select new db # engine.execute("SET NAMES UTF8MB4;") query = "SELECT * FROM .... where ...=..." pm = pd.read_sql(query, engine) rootpath = "path/" for root, dirs, files in os.walk(rootpath): for file in files: print(root, dirs, files, file) d = pd.read_excel(root + file, header=None) d.drop(columns=[0], inplace=True) d.rename(columns={1: "content"}, inplace=True) participants = [] for ix, row in d.iterrows(): c = row["content"] match = re.search(r"^(.+:)(.+)|(r(\w+)?\s*\[(.*)\]\s*(\w+))", c) prefix = match.group(1) only_content = match.group(2) try: timestamp = re.search(r"\[(\d{1,2}:\d{1,2}:\d{1,2})\]", prefix).group(1) except: timestamp = "-99" # print(timestamp) if re.search(r"\s(Versuchsleiter|ersuchsleiter|Versuchsleit|Versuch):", prefix): id_code = "Versuchsleiter" else: starting_digits = re.search(r"^(\d+)", prefix) id_code = re.search(r"(\d{2,4}.{1,3}):", prefix).group(1) if hasattr(starting_digits, 'group'): id_code = starting_digits.group(1) + id_code # # get pid participant = pm.loc[pm["id_code"] == id_code, "pid"] try: pid = participant.values[0] except: pid = "Versuchsleiter" # print(ix, pid, id_code, only_content, timestamp) if pid and pid not in participants and pid != "Versuchsleiter": participants.append(pid) d.loc[ix, "pid"] = pid d.loc[ix, "timestamp"] = timestamp d.loc[ix, "content"] = only_content.strip() d.loc[ix, "is_participant"] = 0 if pid == "Versuchsleiter" else 1 d = d[["pid", "is_participant", "content", "timestamp"]] d.loc[(d['pid'] == "Versuchsleiter"), "pid"] = participants[0] d.loc[(d['pid'] == None), "pid"] = participants[0] insert_or_update(engine, d, "table of sql")``` I need "Versuchsleiter" since some of the ID's are "Versuchsleiter" Thank you! </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

从Excelfile匹配Python中的正则表达式

1 个回答

相关Python问题