3.2 非等值分组

 

3.1 等值分组


3.2.1 对位分组

统计员工表指定部门 [‘Administration’, ‘HR’, ‘Marketing’, ‘Sales’] 男员工的人数

SPL

A
1 =file(“EMPLOYEE.csv”).import@tc()
2 =A1.select(GENDER==“M”)
3 [Administration, HR, Marketing, Sales]
4 =A2.align@a(A3,DEPT).new(A3(#):DEPT,~.len():NUM)

SPL 提供了 align@a() 方法来完成对位分组,结果和 group 一致,都返回分组后的子集。

SQL

SELECT d.DEPT, COUNT(GENDER) AS MaleCount
FROM (
    SELECT COLUMN_VALUE AS DEPT 
    FROM TABLE(SYS.ODCIVARCHAR2LIST('Administration', 'HR', 'Marketing', 'Sales'))) d
LEFT JOIN EMPLOYEE e ON d.DEPT = e.DEPT AND e.GENDER = 'M'
GROUP BY d.DEPT;

SQL 只能通过 left join 完成这个任务。

Python

Python 中没有专门的类似对位分组的方法,要实现只能硬编码。

3.2.2 枚举分组

计算公司员工各世代(X 世代:1965-1980 年出生,Y 世代:1980-1995 年出生,Z 世代:1995-2010 年出生)的平均工资

SPL

A B
1 =file(“EMPLOYEE.csv”).import@tc()
2 [?>=date(“1965-01-01”)&&?<date(“1980-01-01”),?>=date(“1980-01-01”)&&?<date(“1995-01-01”),?>=date(“1995-01-01”)&&?<date(“2010-01-01”)] / 分组条件
3 [X,Y,Z]
4 =A1.run(BIRTHDAY=date(BIRTHDAY))
5 =A1.enum(A2,BIRTHDAY).new(A3(#):GENERATION,~.avg(SALARY):SALARY) / 枚举分组
6 [1965-01-01,1980-01-01,1995-01-01,2010-01-01]
7 =A1.align@a(A3,A3(A6.pseg(BIRTHDAY))).new(A3(#):GENERATION,~.avg(SALARY):SALARY) / 对位分组完成枚举分组任务

A2:将不同世代的条件写成字符串构成一个序列,用? 表示将要代入计算的分组键值。

A5:enum() 方法用待分组集合中每个成员的分组键去依次计算这些条件,如果得到 true,则将该成员分到相应的组中。

A6:将世代的节点年份放入序列。

A7:利用 align()和 pseg() 方法完成枚举分组任务。

SQL

WITH t AS (SELECT 'X' AS generation, DATE '1965-01-01' AS s,DATE '1980-01-01' AS e FROM DUAL 
    UNION ALL
        SELECT 'Y' AS generation, DATE '1980-01-01' AS s,DATE '1995-01-01' AS e FROM DUAL
    UNION ALL
        SELECT 'Z' AS generation, DATE '1995-01-01' AS s,DATE '2010-01-01' AS e FROM DUAL)
SELECT generation,AVG(SALARY) FROM t LEFT JOIN EMPLOYEE e on e.BIRTHDAY>=t.s AND e.BIRTHDAY<t.e GROUP BY generation;

Python

Python 中没有专门的类似枚举分组的方法,要实现只能硬编码。

3.2.3 可重复枚举分组

按员工工龄将员工分组并统计每组的员工人数(分组条件重合时,列出所有满足条件的员工,分组的条件是 [工龄 <5 年,5 年 <= 工龄 <10 年,工龄 >=10 年,工龄 >=15 年])

SPL

A
1 =file(“EMPLOYEE.csv”).import@tc()
2 [?<5,?>=5 && ?<10,?>=10,?>=15]
3 [Within five years,Five to ten years,More than ten years,Over fifteen years]
4 =A1.derive(age(HIREDATE):EMPLOYED)
5 =A4.enum@r(A2, EMPLOYED).new(A3(#):EMPLOYEED,~.len():NUM)

enum@r() 中@r 选项允许成员重复出现在不同的子集中。

SQL

WITH t AS (SELECT 'Within five years' AS employeed, 0 AS s,5 AS e FROM DUAL 
    UNION ALL
     SELECT 'Five to ten years' AS employeed, 5 AS s,10 AS e FROM DUAL 
    UNION ALL
        SELECT 'More than ten years' AS employeed, 10 AS s,100 AS e FROM DUAL 
    UNION ALL
        SELECT 'Over fifteen years' AS employeed, 15 AS s,100 AS e FROM DUAL)
SELECT employeed,COUNT(h.HIRED) AS NUM 
FROM t 
LEFT JOIN (
    SELECT TRUNC(MONTHS_BETWEEN(SYSDATE, HIREDATE)/12) AS HIRED 
    FROM EMPLOYEE) h 
ON h.HIRED>=t.s AND h.HIRED <t.e 
GROUP BY employeed;

Python

Python 只能硬编码。


3.3 有序分组
SPL SQL Python 代码示例对比