SQLAlchemy如何在多个列上计数不同

SELECT count(*) AS count_1 FROM (SELECT DISTINCT values.`Serial` AS `values_Serial`, values.`DatumOrig` AS `values_DatumOrig`, values.`Glucose` AS `values_Glucose` FROM values) AS anon_1

1条回答

网友

1楼 · 发布于 2024-09-30 08:32:41

首先，我认为COUNT(DISTINCT)支持多个表达式是一个MySQL扩展。例如，您可以通过使用ROW值在PostgreSQL中实现相同的功能，但对于NULL，其行为不同。在MySQL中，如果任何值表达式的计算结果为NULL，则该行不符合条件。这也导致了问题中两个查询的不同之处：

如果Serial、DatumOrig或Glucose中的任何一个在COUNT(DISTINCT)查询中为空，则该行不符合条件，或者换句话说不计算
COUNT(*)是子查询的基数anon_1，或者换句话说是行数SELECT DISTINCT Serial, DatumOrig, Glucose将包括（不同的）空行

查看这两个查询的EXPLAIN输出，子查询似乎会导致MySQL使用临时表。这可能会导致性能差异，尤其是在磁盘上实现时

在SQLAlchemy中生成多值COUNT(DISTINCT)查询有点棘手，因为^{}是一个通用函数，实现更接近SQL标准。它只接受单个表达式作为其（可选）位置参数，^{}也是如此。如果所有其他操作都失败，则始终可以恢复为^{}片段，如本例所示：

# NOTE: text() fragments are included in the query as is, so if the text originates
# from an untrusted source, the query cannot be trusted.
session.query(func.count(distinct(text("`Serial`, `DatumOrig`, `Glucose`")))).\
    select_from(Value).\
    scalar()

这远不是可读和可维护的代码，但现在已经完成了任务。另一种选择是编写一个实现MySQL扩展的自定义构造，或者按照您的尝试重写查询。形成生成所需SQL的自定义构造的一种方法是：

from itertools import count
from sqlalchemy import func, distinct as _distinct

def _comma_list(exprs):
    # NOTE: Magic number alert, the precedence value must be large enough to avoid
    # producing parentheses around the "comma list" when passed to distinct()
    ps = count(10 + len(exprs), -1)
    exprs = iter(exprs)
    cl = next(exprs)
    for p, e in zip(ps, exprs):
        cl = cl.op(',', precedence=p)(e)

    return cl

def distinct(*exprs):
    return _distinct(_comma_list(exprs))

session.query(func.count(distinct(
    Value.Serial, Value.DatumOrig, Value.Glucose))).scalar()

相关问题更多 >

编程相关推荐

热门问题

热门文章