分组后成员作为集合两两运算
【问题】
Suppose I have a MySQL database in 1NF with the following tables:
sandwich\_id  |  sandwich\_name  | sandwich_price
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
0            |  BLT            |  5.5
1            |  Reuben         |  7.0
3            |  Grilled Cheese |  3.75
...
and a separate table that stores all the ingredient values:
sandwich_id |  ingredient
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
0           |  bacon
0           |  lettuce
0           |  tomato
1           |  corned beef
1           |  swiss cheese
...
How can I compare all the sandwiches by their ingredients to determine which are the most similar?
(Also, is there a technical term for that second table that I’m missing? I want to call it a map table, but I know that’s not quite right, since a map table stores foreign keys for two tables and this one’s more of an offshoot of the first…)
【回答】
把不同 sandwich 的 ingredient 集合两两做交集,计算交集成员数量再排序即可。但 SQL 没有显式集合,代码很难懂。这种数据量不大时用 SPL 更方便:
| A | |
| 1 | $select i.sandwich_id sandwich_id, i.ingredient ingredient, s.sandwich_name sandwich_namefrom ingredientTable i,sandwichTable s where i.sandwich_id=s.sandwich_id | 
| 2 | =A1.group(sandwich_id;sandwich_name,~.(ingredient):collection) | 
| 3 | =xjoin(A2;A2).select(_1.sandwich_id<_2.sandwich_id) | 
| 4 | =A3.new((_1.collection ^ _2.collection).len():sameCount,_1.sandwich_name,_2.sandwich_name) | 
| 5 | =A4.sort(-sameCount) | 
A3: 将 sandwich 两两分为一组。比如 0,1,3 可分为三组:[0,1],[0,3],[1,3]
A4: 求各组交集的成员数量。“^”表示求交集。
A5: 将 A4 的结果集降序排序
关于显式集合请参考【集算器的集合思维】。更多例子可参考【集算器的集合运算举例】
 
            
         
