Python正则表达式将充满数据的大字符串插入SQL数据库

2024-06-29 01:14:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我想取一个巨大的字符串,将其切碎并按顺序放入SQL表中。你知道吗

到目前为止,我已经尝试使用regex来拆分字符串,获得所需的值,并尝试像这样将它们插入表中。你知道吗

conn = sqlite3.connect('PP.DB')
c = conn.cursor()
c.execute('''CREATE TABLE apps (DisplayName, DisplayVersion, Publisher, InstallDate, PSCOmputerName, RunspaceId)''')

# Split up string based on new lines
bigStringLines = re.split(r'\\r\\n', myBigString)

for line in bigStringLines:
    values = re.split(":", line)
    stmt = "INSERT INTO mytable (\"" + values[0] + "\") VALUES (\"" + values[1] + "\");"
    c.execute(stmt)

但是在SQL数据库中看起来是这样的

DisplayName          DisplayVersion   Publisher        InstallDate  PSComputerName   RunspaceId
Installed program 1  
                     1.2.3.123        
                                      CyberSoftware    
                                                       20121115     
                                                                    Computer1        
                                                                                     b37da93e9c05
Installed program 2
                      4.5.6.456        
                                        MicroSoftware    
                                                       20160414     
                                                                    Computer2        
                                                                                     b37da93e9c06

理想情况下,我希望它在数据库中看起来像这样:

DisplayName          DisplayVersion   Publisher        InstallDate  PSComputerName   RunspaceId
Installed program 1  1.2.3.123        CyberSoftware    20121115     Computer1        b37da93e9c05
Installed program 2  4.5.6.456        MicroSoftware    20160414     Computer2        b37da93e9c06

下面是字符串的主要结构:

DisplayName    : Installed program 1
DisplayVersion : 1.2.3.123
Publisher      : CyberSoftware
InstallDate    : 20121115
PSComputerName : Computer1
RunspaceId     : 38ff5be0-da11-4664-97b1-b37da93e9c05

DisplayName    : Installed program 2
DisplayVersion : 2.2.2.147
Publisher      : CyberSoftware
InstallDate    : 20140226
PSComputerName : Computer1
RunspaceId     : 38ff5be0-da11-4664-97b1-b37da93e9c05

只需要一点额外的背景信息,这将是一个更大的程序的一部分,该程序可以查询在一大组计算机上安装了哪些应用程序。为了进行测试,我只使用SQLite,但我计划将来将其迁移到MySQL。你知道吗

如果有人知道我做错了什么或有什么建议,我将不胜感激。你知道吗


Tags: installed字符串sqlconnprogrampublishervaluesdisplayname
1条回答
网友
1楼 · 发布于 2024-06-29 01:14:53

您正在为文本文件中的每一行执行插入操作,而不是为文件中的每一条记录执行插入操作。只对每条记录做一次插入。如果这是一致的,则填充变量并在填充RunSpaceId或空行后插入,然后清除所有变量(或使用字典,可能更容易)并迭代到下一条记录。比如:

conn = sqlite3.connect('PP.DB')
c = conn.cursor()
c.execute('''CREATE TABLE apps (DisplayName, DisplayVersion, Publisher, InstallDate, PSCOmputerName, RunspaceId)''')

# Split up string based on new lines
bigStringLines = re.split(r'\\r\\n', myBigString)
record = {}

for line in bigStringLines:
    if line.startswith("DisplayName"):
        record["DisplayName"] = re.split(":", line)[1]  # or find index of colon and use negative slice notation from end of string
    elif line.startswith("DisplayVersion"):
        record["DisplayVersion"] = re.split(":", line)[1]
    # and so on for all values....
    elif line.strip() == "":  # blank line = end of record (or use `RunSpaceId as trigger once populated)
        stmt = "INSERT INTO mytable (DisplayName, DisplayVersion, Publisher, InstallDate, PSCOmputerName, RunspaceId) VALUES ({DisplayName}, {DisplayVersion}, {Publisher}, {InstallDate}, {PSCOmputerName}, {RunspaceId});".format(**record)  # adjust as needed depending on python version
        c.execute(stmt)
        record = {}  # reset for next record

另外,如果这是在一个文本文件中,则完全可以不用RegEx来完成(我建议这样做)。如果是本地平面文件,则没有理由将整个文件读入内存。你知道吗

相关问题 更多 >