4.4 分组聚合

 

4.3 集合聚合


1. 各部门的平均工资
2. 各部门工资最高的成员
3. 各部门工资最高的三个员工
4. 各部门 EID 最小的员工 (EID 有序)

SPL

A B
1 =file(“EMPLOYEE.csv”).import@tc()
2 =A1.groups(DEPT;avg(SALARY):avg_salary) / 各部门的平均工资
3 =A1.groups(DEPT;maxp(SALARY):max_salary_emp) / 各部门工资最高的成员 (1 个成员)
4 =A1.groups(DEPT;maxp@a(SALARY):max_salary_emp) / 各部门工资最高的成员 (多个成员)
5 =A1.groups(DEPT;top(-3;SALARY):max3_age_emp) / 各部门工资最高的三个员工
6 =A1.group@1(DEPT) / 各部门 EID 最小的员工

A6 的 group@1() 方法是获取分组的第一个成员,因为 EID 本来是有序的,所以分组后的第一个成员就是该部门 EID 最小的员工。

SQL

1. 各部门的平均工资

SELECT DEPT, AVG(SALARY) AS AVERAGE_SALARY
FROM EMPLOYEE
GROUP BY DEPT;

2. 各部门工资最高的成员 (1 个成员)

SELECT *
FROM (
    SELECT EMPLOYEE.*,
    	ROW_NUMBER() OVER (PARTITION BY DEPT ORDER BY SALARY DESC) AS rn
    FROM EMPLOYEE) subquery
WHERE rn = 1;

3. 各部门工资最高的成员 (多个成员)

SELECT E.* FROM EMPLOYEE E
INNER JOIN (
    SELECT DEPT, MAX(SALARY) AS MAX_SALARY
    FROM EMPLOYEE
    GROUP BY DEPT) M
ON E.DEPT = M.DEPT AND E.SALARY = M.MAX_SALARY
ORDER BY E.DEPT;

4. 各部门工资最高的三个员工

SELECT * FROM (
    SELECT EMPLOYEE.*,
        ROW_NUMBER() OVER (PARTITION BY DEPT ORDER BY SALARY DESC) AS rn
    FROM EMPLOYEE) subquery
WHERE rn <= 3;

5. 各部门 EID 最小的员工 (EID 有序)

SELECT * FROM (
    SELECT EMPLOYEE.*,
        ROW_NUMBER() OVER (PARTITION BY DEPT ORDER BY NULL) AS rn
    FROM EMPLOYEE) subquery
WHERE rn = 1
ORDER BY EID;

Python

emp = pd.read_csv('../EMPLOYEE.csv')
#各部门的平均工资
avg_salary_by_dept = emp.groupby('DEPT')['SALARY'].mean()
#各部门工资最高的成员(1个成员)
highest_paid_employee_by_dept = emp.groupby('DEPT').apply(
    lambda x: x.nlargest(1, 'SALARY')).reset_index(drop=True) 
#各部门工资最高的成员(多个成员)
highest_paid_employee_by_dept_all = emp.groupby('DEPT').apply(
    lambda x: x.nlargest(1, 'SALARY',keep='all')).reset_index(drop=True)
#各部门工资最高的三个员工
top_3_employees_by_dept = emp.groupby('DEPT').apply(
    lambda x: x.nlargest(3, 'SALARY')).reset_index(drop=True)
#各部门EID最小的员工
first_employee_by_dept = emp.groupby('DEPT').first()

5.1 简单迭代运算
SPL SQL Python 代码示例对比