pandas`factorize（）`的ANSI SQL等价物？问题的回答

pandas`factorize（）`的ANSI SQL等价物？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

考虑以下两个选项 注意，我使用的是稍加修改的数据示例-您将看到原因（我希望） <pre><code>with `project.dataset.table` as ( select '2021-01-01 00:01:00' sent , 'email4@example.com' recipient union all select '2021-01-01 00:02:00', 'email2@example.com' union all select '2021-01-01 00:03:00', 'email4@example.com' union all select '2021-01-01 00:04:00', 'email3@example.com' union all select '2021-01-01 00:05:00', 'email4@example.com' union all select '2021-01-01 00:06:00', 'email2@example.com' ) </code></pre> 备选案文1： 在这种情况下，如果在分配唯一的_id之前应该设置这些电子邮件的顺序-例如通过<code>sent</code>列。在这种情况下考虑以下 <pre><code>#standardSQL create temp function factorize(item string, list any type) as (( select unique_id from ( select as struct recipient, row_number() over(order by min(sent)) - 1 unique_id from unnest(list) group by recipient ) where recipient = item )); select t.*, factorize(recipient, array_agg(struct(recipient, sent)) over()) unique_id from `project.dataset.table` t </code></pre> 有输出 <a href="https://i.stack.imgur.com/Nf9Q2.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/Nf9Q2.png" alt="enter image description here"/></a> 备选案文2： 如果排序不是很重要，你可以按字母顺序排序，下面考虑一下使用内置^ ^ a2}函数的简单查询。 <pre><code>#standardSQL create temp function factorize(item string, list any type) as ( range_bucket(item, list) - 1 ); with all_recipients as ( select array_agg(recipient order by recipient) recipients from ( select recipient from `project.dataset.table` group by recipient ) ) select t.*, factorize(recipient, recipients) unique_id from `project.dataset.table` t, all_recipients </code></pre> 有输出 <a href="https://i.stack.imgur.com/ymW8i.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/ymW8i.png" alt="enter image description here"/></a> 显然，在这种情况下，您可以跳过使用udf，只需在最终选择中使用ragge_bucket（而不是在udf中） <pre><code>select t.*, range_bucket(recipient, recipients) - 1 unique_id </code></pre>

pandas`factorize（）`的ANSI SQL等价物？

1 个回答

相关Python问题