SQL 如何计算带数量的数据的中位数 *
有数据库表TBLCARS,数据如下所示:
CARS
PRICE
QUANTITY
A
100
2
B
150
4
C
200
8
需要按QUANTITY扩展PRICE后求中位数。
SQL Server的SQL:
WITH median_dt AS (
SELECT CASE
WHEN sum(quantity) % 2 = 0 THEN sum(quantity) / 2
ELSE sum(quantity) / 2 + 1
END AS lower_limit
, CASE
WHEN sum(quantity) % 2 = 0 THEN sum(quantity) / 2 + 1
ELSE sum(quantity) / 2 + 1
END AS upper_limit
FROM t
),
data AS (
SELECT *, sum(quantity) OVER (ORDER BY price ASC) AS cum_sum
FROM t
),
rnk_val AS (
SELECT *
FROM (
SELECT price, row_number() OVER (ORDER BY d.cum_sum ASC) AS rnk
FROM data d
JOIN median_dt b ON b.lower_limit <= d.cum_sum
) x
WHERE x.rnk = 1
UNION ALL
SELECT *
FROM (
SELECT price, row_number() OVER (ORDER BY d.cum_sum ASC) AS rnk
FROM data d
JOIN median_dt b ON b.upper_limit <= d.cum_sum
) x
WHERE x.rnk = 1
)
SELECT avg(price) AS median
FROM rnk_val
这道题其实很简单,只需要把每条记录的PRICE扩展QUANTITY倍,依次合并后对PRICE求中位数即可。但是在SQL中的集合是无序的,也没有中位数函数可用,只能通过繁琐的办法人为造出序号后,把数据合并后再去求中位数。
用开源集算器的SPL就很容易写:
A |
|
1 |
=connect("oracle") |
2 |
=A1.query@x("SELECT * FROM TBLCARS") |
3 |
=A2.conj([PRICE]*QUANTITY).median() |
SPL是专业的数据计算引擎,基于有序集合设计,提供了完善的集合运算和丰富的数学函数,相当于Java和SQL优势的结合,很容易处理这种动态扩展的数据计算。
https://stackoverflow.com/questions/64105359/sql-how-to-calculate-median-not-based-on-rows
英文版