8.3 混合关联

 

8.2 主键关联


利用订单表,订单明细表,产品信息表,员工信息表,出差信息表,客户信息表,城市信息表,计算出差时间大于 10 天的 90 后销售员在各州销售加利福尼亚州商品的金额。

关联关系如下:

SPL

A
1 =file(“Employee4.csv”).import@tc()
2 =file(“Travel4.csv”).import@tc()
3 =A2.join(empid,A1:empid,birthday)
4 =A3.select((y=year(birthday),y>=1990&&y<2000&&time>=10))
5 =file(“Client4.csv”).import@tc()
6 =file(“City4.csv”).import@tc()
7 =A5.join(city,A6:cityid,state)
8 =file(“Product4.csv”).import@tc()
9 =A8.join(city,A6:cityid,state)
10 =file(“Detail4.csv”).import@tc()
11 =file(“Order4.csv”).import@tc()
12 =A10.join(productid, A9:productid,state:product_state)
13 =A12.group(orderid)
14 =A11.switch(orderid, A13:orderid;saleid, A4:empid;clientid, A7:clientid)
15 =A14.select(saleid).new(saleid.empid:empid,saleid.name:sale_name,clientid.state:sale_location,orderid.select(product_state==“California”).sum(price):price)
16 =A15.groups(empid,sale_name,sale_location;sum(price):price).select(price)

SPL 面对如此复杂的关联关系时,同维、主子、外键关联关系都很明了,需要同时关联两个关联关系时,SPL 也可以同时解决。

SQL

SELECT e.empid,e.name,c2.state,sum(d.price) AS total_amount
FROM Order4 o
JOIN Employee4 e ON o.saleid=e.empid
JOIN Travel4 t ON e.empid = t.empid
JOIN Detail4 d ON o.orderid = d.orderid
JOIN Product4 p ON d.productid = p.productid
JOIN City4 c ON p.city = c.cityid
JOIN Client4 cl ON o.clientid = cl.clientid
JOIN City4 c2 ON cl.city = c2.cityid
WHERE e.birthday >= TO_DATE('1990-01-01', 'YYYY-MM-DD')
AND e.birthday < TO_DATE('2000-01-01', 'YYYY-MM-DD')
AND t.time >= 10 
AND c.state = 'California'
GROUP BY e.empid,e.name,c2.state;

Python

emp4 = pd.read_csv("../Employee4.csv")
trv4 = pd.read_csv("../Travel4.csv")
emp_inf = pd.merge(emp4,trv4,on=["empid","name"])
years = pd.to_datetime(emp_inf.birthday).dt.year
emp_inf_c = emp_inf[(years>=1990) & (years<2000)&(emp_inf.time>=10)]
clt4 = pd.read_csv("../Client4.csv")
city4 = pd.read_csv("../City4.csv")
sale_location = pd.merge(clt4,city4,left_on='city',right_on='cityid')
pdt4 = pd.read_csv("../Product4.csv")
pdt_location = pd.merge(pdt4,city4,left_on='city',right_on='cityid')
detail4 = pd.read_csv("../Detail4.csv")
order4 = pd.read_csv("../Order4.csv")
detail_pdt = pd.merge(detail4,pdt_location,on='productid',how="left")
order_sale_location = pd.merge(order4,sale_location,on='clientid',how="left")
order_sale_location_emp = pd.merge(order_sale_location,emp_inf_c,
    left_on='saleid',right_on='empid',how="left",suffixes=('_c', '_e'))
order_inf = order_sale_location_emp[order_sale_location_emp.empid.notnull()]
order_detail = pd.merge(order_inf,detail_pdt,on='orderid',how="left",suffixes=('_s', '_p'))
order_detail_Hljp = order_detail[order_detail.state_p=="California"]
res = order_detail_Hljp.groupby(['empid','name_e','state_s'],as_index=False).price.sum()

关联关系太复杂时,Python 最好是一次处理一个关联关系,避免出错。


数据分析编程:SQL,Python or SPL?
SPL SQL Python 代码示例对比