Python从字符串列表生成SQL WHERE/IN子句

2024-09-29 08:25:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我得到了一个任意长度、包含任意字符串的Python列表。 特别是,它可以具有嵌入单引号和/或双引号的字符串。 我无法控制输入,所以我必须接受我得到的

例如:

    valueList = [ "hello'world", 'foo"bar', 'my\'name"is', "see\'you\"soon" ]

    Python shell:
        >>> valueList = [ "hello'world", 'foo"bar', 'my\'name"is', "see\'you\"soon" ]
        >>>
        >>> valueList
        ["hello'world", 'foo"bar', 'my\'name"is', 'see\'you"soon']
        >>>
        >>> valueList[0]
        "hello'world"
        >>>
        >>> valueList[1]
        'foo"bar'
        >>>
        >>> valueList[2]
        'my\'name"is'
        >>>
        >>> valueList[3]
        'see\'you"soon'

由此,我需要生成一个SQL字符串,如:

    "SELECT * FROM myTable as mt
        WHERE mt."colName" IN ("hello'world", 'foo"bar', 'my\'name"is', 'see\'you"soon')

任何解决方案都必须同时使用SQLite和Postgres

我曾尝试使用Python join生成子句的(…)部分,但最终生成了一个包含所有单引号的大字符串。例如:

    Python shell:
        >>> values = "','".join(valueList)
        >>> values
        'hello\'world\',\'foo"bar\',\'my\'name"is\',\'see\'you"soon'

        >>> values = "'" + "','".join(valueList) + "'"
        >>> values
        '\'hello\'world\',\'foo"bar\',\'my\'name"is\',\'see\'you"soon\''

其他信息: 我继承的代码使用SQLAlchemy和Pandas

        import pandas as pd
        ...cut...cut...cut...
        my_df = pd.read_sql(sql, my_conn);

我不想用熊猫做过滤。事实上,我分配的任务是删除现有的Pandas筛选,并用SQL替换为显式WHERE/In筛选以提高速度

例如,替换此:

    my_df = pd.read_sql("SELECT * FROM myTable", my_conn) <==== can return 10's of thousands of rows
    my_df = my_df[my_df.loc[:, 'colName'].isin(myList)] <==== ends up with a handful of rows

为此:

    my_df = pd.read_sql("SELECT * FROM myTable as mt WHERE mt."colName" IN ("hello'world", 'foo"bar', ...)", my_conn)

SQL注入保护是一个优势,但在这一点上,我对任何有效的解决方案都很满意


Tags: 字符串nameyouhellodfworldfoois
2条回答

好的,根据SQL规范,该规范将字符串文字定义为由单引号分隔,并且要在字符串文字中包含单引号,您必须将其加倍(您可以参考SqlitePostgreSQL的语法规范,查看它们是否符合该规范),下面是我的尝试:

value_list = [ "hello'world", 'foo"bar', """my'name"is""", """see'you"soon""" ]
value_list_escaped = [f"""'{x.replace("'", "''")}'""" for x in value_list]
query_template = "SELECT * FROM myTable as mt WHERE mt.colName IN ({})"
query = query_template.format(", ".join(value_list_escaped))
print(query)

这就是你想要的吗

下面是我的问题的有效解决方案的代码片段

此函数非常特定于我的问题,但演示了参数注入技术。它还演示了如何处理SQLite参数注入和Postgres参数注入

def whereInjection(valueList, sqlDict):
    # sqlDict starts with just a "paramCount" key set to an initial value (typically 0 but could be any number).
    # As this function generates parameter injection strings, it generates a key/value pair for each parameter
    # in the form {"p_#": value} where # in the current "paramCount" and value is the value of the associated parameter.
    #
    # The end result for a valueList containing ["aaa", "bbb", "ccc'ddd", 'eee"fff'] will be:
    #   injStr = "(:p_0, :p_1, :p_2, :p_3)"
    #       Note: For Postgres, it has to be "(%(p_0)s, %(p_1)s, etc.)"
    #   sqlDict = {
    #       "paramCount": 3,
    #       "p_0": "aaa",
    #       "p_1": "bbb",
    #       "p_2": "ccc'ddd",
    #       "p_3": 'eee"fff'
    #   }
    localDebugPrintingEnabled = False

    # take into account whether the item values are presented as a list, tuple, set, single int, single string, etc.
    if isinstance(valueList, list):
        vList = valueList
    elif isinstance(valueList, tuple):
        vList = list(valueList)
    elif isinstance(valueList, set):
        vList = list(valueList);
    elif isinstance(valueList, int) or isinstance(valueList, str):
        vList = [valueList]
    else:
        vList = valueList # unexpected type...

    sz = len(vList)
    pc = sqlDict["paramCount"]
    if (db_type == 'SQLite'):
        injectStr = "(" + ",".join((":p_" + str(i + pc)) for i in range(0, sz)) + ")"
    else: # assume Postgres
        injectStr = "(" + ",".join(("%(p_" + str(i + pc) + ")s") for i in range(0, sz)) + ")"
    valueDict = {('p_' + str(i + pc)): vList[i] for i in range(0, sz)}

    sqlDict.update(valueDict) # add the valueDict just generated
    sqlDict["paramCount"] += sz # update paramCount for all parameters just added

    return injectStr

调用代码如下所示。这假设您知道如何创建到数据库的引擎连接

sqlDict = {"paramCount": 0} # start with empty dictionary and starting count of 0
sql = """SELECT * FROM myTable as mt WHERE mt."aColName" IN {0}""".format(whereInjection(itemList, sqlDict));
my_df = pd.read_sql(sql, engine_connection, params=sqlDict); # does the actual parameter injection

相关问题 更多 >