3.2 非等值分组
3.2.1 对位分组
统计员工表指定部门 [‘Administration’, ‘HR’, ‘Marketing’, ‘Sales’] 男员工的人数
SPL
A | |
---|---|
1 | =file(“EMPLOYEE.csv”).import@tc() |
2 | =A1.select(GENDER==“M”) |
3 | [Administration, HR, Marketing, Sales] |
4 | =A2.align@a(A3,DEPT).new(A3(#):DEPT,~.len():NUM) |
SPL 提供了 align@a() 方法来完成对位分组,结果和 group 一致,都返回分组后的子集。
SQL
SELECT d.DEPT, COUNT(GENDER) AS MaleCount
FROM (
SELECT COLUMN_VALUE AS DEPT
FROM TABLE(SYS.ODCIVARCHAR2LIST('Administration', 'HR', 'Marketing', 'Sales'))) d
LEFT JOIN EMPLOYEE e ON d.DEPT = e.DEPT AND e.GENDER = 'M'
GROUP BY d.DEPT;
SQL 只能通过 left join 完成这个任务。
Python
Python 中没有专门的类似对位分组的方法,要实现只能硬编码。
3.2.2 枚举分组
计算公司员工各世代(X 世代:1965-1980 年出生,Y 世代:1980-1995 年出生,Z 世代:1995-2010 年出生)的平均工资
SPL
A | B | |
---|---|---|
1 | =file(“EMPLOYEE.csv”).import@tc() | |
2 | [?>=date(“1965-01-01”)&&?<date(“1980-01-01”),?>=date(“1980-01-01”)&&?<date(“1995-01-01”),?>=date(“1995-01-01”)&&?<date(“2010-01-01”)] | / 分组条件 |
3 | [X,Y,Z] | |
4 | =A1.run(BIRTHDAY=date(BIRTHDAY)) | |
5 | =A1.enum(A2,BIRTHDAY).new(A3(#):GENERATION,~.avg(SALARY):SALARY) | / 枚举分组 |
6 | [1965-01-01,1980-01-01,1995-01-01,2010-01-01] | |
7 | =A1.align@a(A3,A3(A6.pseg(BIRTHDAY))).new(A3(#):GENERATION,~.avg(SALARY):SALARY) | / 对位分组完成枚举分组任务 |
A2:将不同世代的条件写成字符串构成一个序列,用? 表示将要代入计算的分组键值。
A5:enum() 方法用待分组集合中每个成员的分组键去依次计算这些条件,如果得到 true,则将该成员分到相应的组中。
A6:将世代的节点年份放入序列。
A7:利用 align()和 pseg() 方法完成枚举分组任务。
SQL
WITH t AS (SELECT 'X' AS generation, DATE '1965-01-01' AS s,DATE '1980-01-01' AS e FROM DUAL
UNION ALL
SELECT 'Y' AS generation, DATE '1980-01-01' AS s,DATE '1995-01-01' AS e FROM DUAL
UNION ALL
SELECT 'Z' AS generation, DATE '1995-01-01' AS s,DATE '2010-01-01' AS e FROM DUAL)
SELECT generation,AVG(SALARY) FROM t LEFT JOIN EMPLOYEE e on e.BIRTHDAY>=t.s AND e.BIRTHDAY<t.e GROUP BY generation;
Python
Python 中没有专门的类似枚举分组的方法,要实现只能硬编码。
3.2.3 可重复枚举分组
按员工工龄将员工分组并统计每组的员工人数(分组条件重合时,列出所有满足条件的员工,分组的条件是 [工龄 <5 年,5 年 <= 工龄 <10 年,工龄 >=10 年,工龄 >=15 年])
SPL
A | |
---|---|
1 | =file(“EMPLOYEE.csv”).import@tc() |
2 | [?<5,?>=5 && ?<10,?>=10,?>=15] |
3 | [Within five years,Five to ten years,More than ten years,Over fifteen years] |
4 | =A1.derive(age(HIREDATE):EMPLOYED) |
5 | =A4.enum@r(A2, EMPLOYED).new(A3(#):EMPLOYEED,~.len():NUM) |
enum@r() 中@r 选项允许成员重复出现在不同的子集中。
SQL
WITH t AS (SELECT 'Within five years' AS employeed, 0 AS s,5 AS e FROM DUAL
UNION ALL
SELECT 'Five to ten years' AS employeed, 5 AS s,10 AS e FROM DUAL
UNION ALL
SELECT 'More than ten years' AS employeed, 10 AS s,100 AS e FROM DUAL
UNION ALL
SELECT 'Over fifteen years' AS employeed, 15 AS s,100 AS e FROM DUAL)
SELECT employeed,COUNT(h.HIRED) AS NUM
FROM t
LEFT JOIN (
SELECT TRUNC(MONTHS_BETWEEN(SYSDATE, HIREDATE)/12) AS HIRED
FROM EMPLOYEE) h
ON h.HIRED>=t.s AND h.HIRED <t.e
GROUP BY employeed;
Python
Python 只能硬编码。