有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java如何使用Spark SQL从外部查询中的子查询访问列

在spark SQL中,我执行了查询

select 
  Retailer_country,
  max(sold) 
from (
  select 
    Retailer_country,
    count(*) as sold 
  from ProductInfo 
  where year=2013 and Product_type='Watches' 
  group by Retailer_country
)

到具有列的数据集

Retailer_country
Order_method_type
Retailer_type
Product_line
Product_type
Product
Year
Quarter
Revenue
Quantity
Gross_margin

它返回一个异常

 org.apache.spark.sql.AnalysisException: grouping expressions sequence is empty, and '__auto_generated_subquery_name.`Retailer_country`' is not an aggregate function. Wrap '(max(__auto_generated_subquery_name.`solds`) AS `max(solds)`)' in windowing function(s) or wrap '__auto_generated_subquery_name.`Retailer_country`' in first() (or first_value) if you don't care which value you get.;;
 Aggregate [Retailer_country#10, max(solds#77L) AS max(solds)#80L]
+- SubqueryAlias `__auto_generated_subquery_name`
+- Aggregate [Retailer_country#10], [Retailer_country#10, count(1) AS solds#77L]
  +- Filter ((cast(year#16 as int) = 2013) && (Product_type#14 = Watches))
     +- SubqueryAlias `productinfo`
        +- Relation[Retailer_country#10,Order_method_type#11,Retailer_type#12,Product_line#13,Product_type#14,Product#15,Year#16,Quarter#17,Revenue#18,Quantity#19,Gross_margin#20] csv

当我在具有相同表结构的联机编译器中执行相同的查询时,它返回

USA|4

样本数据:

USA|Fax|Outdoors Shop|Camping Equipment|Cooking Gear|TrailChef Deluxe Cook Set|2012|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|Cooking Gear|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
UK|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
UK|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
UK|Fax|Outdoors Shop|Camping Equipment|Cooking Gear|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
UK|Fax|Outdoors Shop|Camping Equipment|Cooking Gear|TrailChef Deluxe Cook Set|2012|Q1 2012|59628.66|489|0.34754797

结果如何不同以及如何在spark中显示Retailer_country。 解决这个问题需要任何帮助


共 (2) 个答案

  1. # 1 楼答案

    难道你不能在上面Simonare的查询中“按销售说明订购”并将记录限制为1吗

    select 
      Retailer_country,
      count(*) as sold 
    from ProductInfo 
    where year=2013 and Product_type='Watches' 
    group by Retailer_country
    order by sold desc
    limit 1
    
  2. # 2 楼答案

    你有两次受伤。一个用于子查询,另一个用于主查询。查询的格式必须如下所示

    select 
      Retailer_country,
      max(sold) 
    from (
      select 
        Retailer_country,
        count(*) as sold 
      from ProductInfo 
      where year=2013 and Product_type='Watches' 
      group by Retailer_country)
    group by Retailer_country
    

    但是,如果我们更深入地查看您的查询,您的子查询将返回不同的零售商国家/地区,这是因为按聚合分组。因此,您不需要使用外部最大(已售出)聚合。因此,您的最终查询实际上是:

    select 
      Retailer_country,
      count(*) as sold 
    from ProductInfo 
    where year=2013 and Product_type='Watches' 
    Group by Retailer_country
    

    编辑:根据提供的评论

    select 
      Retailer_country,
      count(*) as sold 
    from ProductInfo 
    where year=2013 and Product_type='Watches' 
    group by Retailer_country
    Order by sold desc
    limit 1