4.4 分组聚合
1. 各部门的平均工资
2. 各部门工资最高的成员
3. 各部门工资最高的三个员工
4. 各部门 EID 最小的员工 (EID 有序)
SPL
A | B | |
---|---|---|
1 | =file(“EMPLOYEE.csv”).import@tc() | |
2 | =A1.groups(DEPT;avg(SALARY):avg_salary) | / 各部门的平均工资 |
3 | =A1.groups(DEPT;maxp(SALARY):max_salary_emp) | / 各部门工资最高的成员 (1 个成员) |
4 | =A1.groups(DEPT;maxp@a(SALARY):max_salary_emp) | / 各部门工资最高的成员 (多个成员) |
5 | =A1.groups(DEPT;top(-3;SALARY):max3_age_emp) | / 各部门工资最高的三个员工 |
6 | =A1.group@1(DEPT) | / 各部门 EID 最小的员工 |
A6 的 group@1() 方法是获取分组的第一个成员,因为 EID 本来是有序的,所以分组后的第一个成员就是该部门 EID 最小的员工。
SQL
1. 各部门的平均工资
SELECT DEPT, AVG(SALARY) AS AVERAGE_SALARY
FROM EMPLOYEE
GROUP BY DEPT;
2. 各部门工资最高的成员 (1 个成员)
SELECT *
FROM (
SELECT EMPLOYEE.*,
ROW_NUMBER() OVER (PARTITION BY DEPT ORDER BY SALARY DESC) AS rn
FROM EMPLOYEE) subquery
WHERE rn = 1;
3. 各部门工资最高的成员 (多个成员)
SELECT E.* FROM EMPLOYEE E
INNER JOIN (
SELECT DEPT, MAX(SALARY) AS MAX_SALARY
FROM EMPLOYEE
GROUP BY DEPT) M
ON E.DEPT = M.DEPT AND E.SALARY = M.MAX_SALARY
ORDER BY E.DEPT;
4. 各部门工资最高的三个员工
SELECT * FROM (
SELECT EMPLOYEE.*,
ROW_NUMBER() OVER (PARTITION BY DEPT ORDER BY SALARY DESC) AS rn
FROM EMPLOYEE) subquery
WHERE rn <= 3;
5. 各部门 EID 最小的员工 (EID 有序)
SELECT * FROM (
SELECT EMPLOYEE.*,
ROW_NUMBER() OVER (PARTITION BY DEPT ORDER BY NULL) AS rn
FROM EMPLOYEE) subquery
WHERE rn = 1
ORDER BY EID;
Python
emp = pd.read_csv('../EMPLOYEE.csv')
#各部门的平均工资
avg_salary_by_dept = emp.groupby('DEPT')['SALARY'].mean()
#各部门工资最高的成员(1个成员)
highest_paid_employee_by_dept = emp.groupby('DEPT').apply(
lambda x: x.nlargest(1, 'SALARY')).reset_index(drop=True)
#各部门工资最高的成员(多个成员)
highest_paid_employee_by_dept_all = emp.groupby('DEPT').apply(
lambda x: x.nlargest(1, 'SALARY',keep='all')).reset_index(drop=True)
#各部门工资最高的三个员工
top_3_employees_by_dept = emp.groupby('DEPT').apply(
lambda x: x.nlargest(3, 'SALARY')).reset_index(drop=True)
#各部门EID最小的员工
first_employee_by_dept = emp.groupby('DEPT').first()