SQL（或python）从类似行中选择一次值

2条回答

网友

1楼 · 编辑于 2024-09-30 04:27:07

如果您的bucket总是从1开始并且increment=1，那么您可以使用非常简单的mod(row_number, max(bucket))：

select
   line, term, course
  ,1+mod(-1+row_number()over(partition by line order by course),max(bucket)) bucket_n
from t 
group by line, term, course
order by line,course;

示例数据的完整示例：

with t(LINE ,BUCKET ,TERM ,COURSE) as (
   select 1001, 1, 'FA18', 'COURSE 101' from dual union all
   select 1001, 1, 'SP19', 'COURSE 102' from dual union all
   select 1001, 1, 'SP19', 'COURSE 103' from dual union all
   select 1001, 1, 'FA19', 'COURSE 104' from dual union all
   select 1001, 2, 'FA18', 'COURSE 101' from dual union all
   select 1001, 2, 'SP19', 'COURSE 102' from dual union all
   select 1001, 2, 'SP19', 'COURSE 103' from dual union all
   select 1001, 2, 'FA19', 'COURSE 104' from dual union all
   select 2001, 1, 'FA18', 'COURSE 201' from dual union all
   select 2001, 1, 'SP19', 'COURSE 202' from dual union all
   select 2001, 1, 'FA20', 'COURSE 203' from dual union all
   select 2001, 2, 'FA18', 'COURSE 201' from dual union all
   select 2001, 2, 'SP19', 'COURSE 202' from dual union all
   select 2001, 2, 'FA20', 'COURSE 203' from dual union all
   select 2001, 3, 'FA18', 'COURSE 201' from dual union all
   select 2001, 3, 'SP19', 'COURSE 202' from dual union all
   select 2001, 3, 'FA20', 'COURSE 203' from dual
)
select
   line, term, course
  ,1+mod(-1+row_number()over(partition by line order by course),max(bucket)) bucket_n
from t 
group by line, term, course
order by line,course;

结果:

LINE    TERM    COURSE      BUCKET_N
   -    -      -     
1001    FA18    COURSE 101  1
1001    SP19    COURSE 102  2
1001    SP19    COURSE 103  1
1001    FA19    COURSE 104  2
2001    FA18    COURSE 201  1
2001    SP19    COURSE 202  2
2001    FA20    COURSE 203  3

另一个有趣的变体是聚合存储桶并通过位置mod(rownumber, count(buckets))提取值-与上一个解决方案相反，它适用于任何存储桶：

select
   line, term, course
  ,xmlcast(
     xmlelement(
        "buckets",  
        xmlagg(xmlelement("bucket", bucket))
     ).extract('/buckets/*['||
      (1+mod(-1+row_number()over(partition by line order by course),count(bucket)))
     ||']')
    as int) bucket_n_2
from t 
group by line, term, course
order by line,course;

完整测试用例：

with t(LINE ,BUCKET ,TERM ,COURSE) as (
   select 1001, 1, 'FA18', 'COURSE 101' from dual union all
   select 1001, 1, 'SP19', 'COURSE 102' from dual union all
   select 1001, 1, 'SP19', 'COURSE 103' from dual union all
   select 1001, 1, 'FA19', 'COURSE 104' from dual union all
   select 1001, 2, 'FA18', 'COURSE 101' from dual union all
   select 1001, 2, 'SP19', 'COURSE 102' from dual union all
   select 1001, 2, 'SP19', 'COURSE 103' from dual union all
   select 1001, 2, 'FA19', 'COURSE 104' from dual union all
   select 2001, 1, 'FA18', 'COURSE 201' from dual union all
   select 2001, 1, 'SP19', 'COURSE 202' from dual union all
   select 2001, 1, 'FA20', 'COURSE 203' from dual union all
   select 2001, 2, 'FA18', 'COURSE 201' from dual union all
   select 2001, 2, 'SP19', 'COURSE 202' from dual union all
   select 2001, 2, 'FA20', 'COURSE 203' from dual union all
   select 2001, 3, 'FA18', 'COURSE 201' from dual union all
   select 2001, 3, 'SP19', 'COURSE 202' from dual union all
   select 2001, 3, 'FA20', 'COURSE 203' from dual
)
select
   line, term, course
  ,1+mod(-1+row_number()over(partition by line order by course),max(bucket)) bucket_n
  ,xmlcast(
     xmlelement(
        "buckets",  
        xmlagg(xmlelement("bucket", bucket))
     ).extract('/buckets/*['||
      (1+mod(-1+row_number()over(partition by line order by course),count(bucket)))
     ||']')
    as int) bucket_n_2
from t 
group by line, term, course
order by line,course;

xmlagg(xmlelement("bucket", bucket))聚合所有存储桶编号
extract('/buckets/*[N])-从聚合值中提取N的存储桶
(1+mod(-1+row_number()over(partition by line order by course),count(bucket)))-计算第n个存储桶

结果：BUCKET_N-先前，BUCKET_N_2-新变体：

LINE    TERM    COURSE      BUCKET_N    BUCKET_N_2
1001    FA18    COURSE 101          1           1
1001    SP19    COURSE 102          2           2
1001    SP19    COURSE 103          1           1
1001    FA19    COURSE 104          2           2
2001    FA18    COURSE 201          1           1
2001    SP19    COURSE 202          2           3
2001    FA20    COURSE 203          3           2

网友

2楼 · 编辑于 2024-09-30 04:27:07

基本思想是row_number()。如果您只想对桶进行随机抽样：

select t.*
from (select t.*,
             row_number() over (partition by line, course order by dbms_random.random) as seqnum
      from t
     ) t
where seqnum = 1;

如果您确实希望在存储桶上进行循环（以确保选择了最大数量），那么随机性是不够的：

select t.*
from (select t.*,
             row_number() over (partition by line, course order by seqnum_bucket, dbms_random.random) as seqnum
      from (select t.*
                   row_number() over (partition by line, course, bucket order by dbms_random.random) as seqnum_bucket
            from t
           ) t
     ) t
where seqnum = 1;

相关问题更多 >

编程相关推荐

热门问题

热门文章

SQL（或python）从类似行中选择一次值

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >