回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我正在使用正则表达式来匹配下面的excel文件,我正在努力解决如何
把每一行分开</p>
<p>时间戳<code>[0:00:48]</code>,
ID<code>20052A</code>
和内容{<cd3>}</p>
<p>这是excel行(许多行中的一行,因此ID可能因行而异,时间戳以及内容也可能不同)</p>
<p><code>[0:00:48] 20052A: content (more content)</code></p>
<p>我得到一个错误代码</p>
<p><code>AttributeError: 'NoneType' object has no attribute 'group</code></p>
<p>用于匹配我的身份证</p>
<p><code>(r"^(.+:)(.+)|(r(\w+)?\s*\[(.*)\]\s*(\w+))", c)</code></p>
<p>请记住,有时ID看起来像这样</p>
<p><code>[0:00:33] 30091aA: (content) </code></p>
<p>我的全部任务是(取消与数据库的连接)</p>
<pre><code>import os
import re
import pymysql
pymysql.install_as_MySQLdb()
import pandas as pd
import sqlalchemy
def insert_or_update(engine, pd_table, table_name):
inserts = 0
updates = 0
for i in range(len(pd_table)):
vals_with_quotes = ["'" + str(x) + "'" for x in pd_table.loc[i, :].values]
# print(vals_with_quotes)
update_pairs = [str(c) + " = '" + str(v) + "'" for c, v in zip(pd_table.columns, pd_table.loc[i, :])]
query = f"INSERT INTO {table_name} ({', '.join(list(pd_table.columns.values))}) " \
f"VALUES ({', '.join(vals_with_quotes)}) " \
f"ON DUPLICATE KEY UPDATE {', '.join(update_pairs)}"
print(query)
result = engine.execute(query)
if result.lastrowid == 0:
updates += 1
else:
inserts += 1
print(f"Inserted {inserts} rows and updated {updates} rows.")
schema = '---'
alchemy_connect = "---"
engine = sqlalchemy.create_engine(alchemy_connect) # connect to server
engine.execute(f"USE {schema}") # select new db
# engine.execute("SET NAMES UTF8MB4;")
query = "SELECT * FROM .... where ...=..."
pm = pd.read_sql(query, engine)
rootpath = "path/"
for root, dirs, files in os.walk(rootpath):
for file in files:
print(root, dirs, files, file)
d = pd.read_excel(root + file, header=None)
d.drop(columns=[0], inplace=True)
d.rename(columns={1: "content"}, inplace=True)
participants = []
for ix, row in d.iterrows():
c = row["content"]
match = re.search(r"^(.+:)(.+)|(r(\w+)?\s*\[(.*)\]\s*(\w+))", c)
prefix = match.group(1)
only_content = match.group(2)
try:
timestamp = re.search(r"\[(\d{1,2}:\d{1,2}:\d{1,2})\]", prefix).group(1)
except:
timestamp = "-99"
# print(timestamp)
if re.search(r"\s(Versuchsleiter|ersuchsleiter|Versuchsleit|Versuch):", prefix):
id_code = "Versuchsleiter"
else:
starting_digits = re.search(r"^(\d+)", prefix)
id_code = re.search(r"(\d{2,4}.{1,3}):", prefix).group(1)
if hasattr(starting_digits, 'group'):
id_code = starting_digits.group(1) + id_code #
# get pid
participant = pm.loc[pm["id_code"] == id_code, "pid"]
try:
pid = participant.values[0]
except:
pid = "Versuchsleiter"
# print(ix, pid, id_code, only_content, timestamp)
if pid and pid not in participants and pid != "Versuchsleiter":
participants.append(pid)
d.loc[ix, "pid"] = pid
d.loc[ix, "timestamp"] = timestamp
d.loc[ix, "content"] = only_content.strip()
d.loc[ix, "is_participant"] = 0 if pid == "Versuchsleiter" else 1
d = d[["pid", "is_participant", "content", "timestamp"]]
d.loc[(d['pid'] == "Versuchsleiter"), "pid"] = participants[0]
d.loc[(d['pid'] == None), "pid"] = participants[0]
insert_or_update(engine, d, "table of sql")```
I need "Versuchsleiter" since some of the ID's are "Versuchsleiter"
Thank you!
</code></pre>