如何使用SQLAlchemy和Postgresq的GROUP BY和HAVING提取具有最大更新日期的行

class SectionStatusModel(db.Model): __tablename__ = "sectionstatus" _id = db.Column(db.Integer, primary_key=True) update_datetime = db.Column(db.DateTime, nullable=False) status = db.Column(db.Integer, nullable=False, default=0) section_id = db.Column(db.Integer, db.ForeignKey("sections._id"), nullable=False) __table_args__ = ( UniqueConstraint("section_id", "update_datetime", name="section_time"), ) @classmethod def find_recent_by_section_id_list( cls, section_id_list: List ) -> List["SectionStatusModel"]: return ( cls.query.filter(cls.section_id.in_(section_id_list)) .group_by(cls.section_id) .having(func.max(cls.update_datetime) == cls.update_datetime) )

E sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column "sectionstatus._id" must appear in the GROUP BY clause or be used in an aggregate function E LINE 1: SELECT sectionstatus._id AS sectionstatus__id, sectionstatus... E ^ E E [SQL: SELECT sectionstatus._id AS sectionstatus__id, sectionstatus.update_datetime AS sectionstatus_update_datetime, sectionstatus.status AS sectionstatus_status, sectionstatus.section_id AS sectionstatus_section_id E FROM sectionstatus E WHERE sectionstatus.section_id IN (%(section_id_1)s, %(section_id_2)s) GROUP BY sectionstatus.section_id E HAVING max(sectionstatus.update_datetime) = sectionstatus.update_datetime] E [parameters: {'section_id_1': 1, 'section_id_2': 2}] E (Background on this error at: http://sqlalche.me/e/f405)

1条回答

网友

1楼 · 发布于 2024-10-03 19:20:08

在SQLite中允许查询，因为它allows ^{} list items to refer to ungrouped columns在聚合函数之外，或者没有所述列在功能上依赖于分组表达式。非聚合值是从组中的任意行中选取的。你知道吗

此外，在sidenote中记录了当聚合为min()或max()^{1时，对聚合查询中的“裸”列进行特殊处理：}

When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum.

这只适用于简单的查询，如果超过1行具有相同的最小值/最大值，或者查询包含超过1个对min()/max()的调用，则同样存在歧义。你知道吗

这使得SQLite在这方面不一致，至少在SQL:2003标准（我相当肯定，在较新版本中，这一点没有太大变化）：

7.12 <query specification>
Function
Specify a table derived from the result of a <table expression>.
Format
<query specification> ::=
    SELECT [ <set quantifier> ] <select list> <table expression>
...
Conformance Rules
...
3) Without Feature T301, “Functional dependencies”, in conforming SQL language, if T is a grouped table, then in each <value expression> contained in the <select list>, each <column reference> that references a column of T shall reference a grouping column or be specified in an aggregated argument of a <set function specification>.

大多数其他SQL DBMS（如Postgresql）在这方面更严格地遵循标准，并要求聚合查询的SELECT列表仅由分组表达式、聚合表达式组成，或者要求任何未分组的列在功能上依赖于分组的列。你知道吗

在Postgresql中，为了获取这种greatest-n-per-group结果，需要一种不同的方法。有许多great posts涉及这个主题，但这里总结了一种特定于Postgresql的方法。将^{}扩展与ORDER BY结合使用，可以获得相同的结果：

@classmethod
def find_recent_by_section_id_list(
        cls, section_id_list: List) -> List["SectionStatusModel"]:
    return (
        cls.query
        .filter(cls.section_id.in_(section_id_list))
        .distinct(cls.section_id)
        # Use _id as a tie breaker, in order to avoid non-determinism
        .order_by(cls.section_id, cls.update_datetime.desc(), cls._id)
    )

当然，这将在SQLite中中断，因为它不支持DISTINCT ON。如果您需要同时适用于这两种情况的解决方案，请使用row_number()窗口函数方法。你知道吗

^{1:注意，这意味着您的HAVING子句实际上并没有太多过滤，因为未分组的值总是从包含最大值的行中选取。仅仅是存在于这个max(update_datetime)中就起了作用。}

7.12 <query specification>

Function

Format

Conformance Rules

相关问题更多 >

编程相关推荐

热门问题

热门文章