"问题介绍作为 java 程序员，用代码直接实现类似 SQL 中的交并补差的集合运算，总是要编写大量的代码，如果能有一个专门的外部数据工具，通过写简单类似 SQL 的脚本来实现，在 jav .."

SmilingAngel 北京
乾学院 18 号会员
1 回帖 • 1,808 浏览 • 7 年前

Java 搞定 SQL 集合运算的最简方法

计算＆AI

集算器(536) Java调用(8) 集合运算(9) 有序集合(3) 技术对比(26)

问题介绍

作为 java 程序员，用代码直接实现类似 SQL 中的交并补差的集合运算，总是要编写大量的代码，如果能有一个专门的外部数据工具，通过写简单类似 SQL 的脚本来实现，在 java 中直接调用并可以返回结果集，就再好不过了。Java 版集算器正是解决这一难题的神器，通过 SPL 脚本可以直观自然得写出运算，再使用 java 调用 SPL 脚本，使用起来简单，快捷，高效。另外，虽然 SQL 有集合概念，但对于有序集合运算提供的支持却很有限，经常要采用很费解的思路才可以完成， SPL 基于离散数据集模型，能轻松处理有序集合运算。下面我们就由浅入深，举例说明如何使用。

SPL 实现

和集

示例 1： 求重叠时间段的总天数

MySQL8:

with recursive t(start,end) as (select date'2010-01-07',date'2010-01-9'
    union all select date'2010-01-15',date'2010-01-16'
    union all select date'2010-01-07',date'2010-01-12'
    union all select date'2010-01-08',date'2010-01-11'),
t1(d,end) as (select start,end from t
    union all select d+1,end from t1 where d
select count(distinct d) from t1;

说明：此例先将各时间段转成时间段内所有日子对应的日期，然后再求不同日期的个数

集算器 SPL:

	A
1	=connect("mysql")
2	=A1.query@x("select date'2010-01-07'start,date'2010-01-9'end union all select date'2010-01-15',date'2010-01-16'union all select date'2010-01-07',date'2010-01-12'union all select date'2010-01-08',date'2010-01-11'")
3	=A2.(periods(start,end))
4	=A3.conj()
5	=A4.icount()

A3: 对 A2 中的每一个时间段构造从 start 到 end 的日期序列
A4: 求 A3 中所有日期序列的和
A5: 求 A4 中不重复日期的个数
001png

保存脚本文件SumSet.splx (嵌入 Java 会用到)

差集

示例 1： 列出英语人口和法语人口均超过 5% 的国家

MySQL8:

with t1(lang) as (select 'English' union all select 'French')
    select name from world.country c
    where not exists(select * from t1 where lang not in (select language from world.countrylanguage 
        where percentage>=5 and countrycode=c.code
    )
);

说明：此 SQL 只是演示通过双重否定实现差集为空

集算器 SPL:

	A
1	=connect("mysql")
2	=A1.query("select CountryCode,Name,Language,Percentage from world.countrylanguage cl join world.country c on cl.countrycode=c.code where percentage>5")
3	=A2.group(CountryCode)
4	=A3.select(["English","French"]\~.(Language)==[])
5	=A4.new(~.Name:name)

A4: 选出 [“English”,”French”] 与本组语言集合的差为空的组，意思就是选出语言集合包含 English 和 French 的组
002png

保存脚本文件DifferenceSet.splx (嵌入 Java 会用到)

交集

示例 1： 列出英语人口、法语人口、西班牙语人口分别超过 0.3%、0.2%、0.1% 的国家代码

MySQL8:

with t1 as (select countrycode from world.countrylanguage where language='English' and percentage>0.3),
     t2 as (select countrycode from world.countrylanguage where language='French' and percentage>0.2),
     t3 as (select countrycode from world.countrylanguage where language='Spanish' and percentage>0.1)
select countrycode
from t1 join t2 using(countrycode) join t3 using(countrycode);

说明：此例只是演示如何求解多个集合的交集

集算器 SPL:

	A
1	=connect("mysql")
2	[English,French,Spanish]
3	[0.3,0.2,0.1]
4	=A2.(A1.query@i("select countrycode from world.countrylanguage where language=? and percentage>?",~,A3(#)))
5	>A1.close()
6	=A4.isect()

A3: 按次序依次查询英语人口超 0.3%、法语人口超 0.2%、西班牙语超 0.1% 的国家代码，并转成序列
A5: A3 中所有序列交集
003png

保存脚本文件IntersectionSet.splx (嵌入 Java 会用到)

Java 调用

SPL 嵌入到 Java 应用程序十分方便，通过 JDBC 调用存储过程方式加载，用和集保存的文件SumSet.splx，示例调用如下：

    ...
    Connection con = null;
    Class.forName("com.esproc.jdbc.InternalDriver");
    con= DriverManager.getConnection("jdbc:esproc:local://");
    //调用存储过程，其中SumSet是splx的文件名
    st =con.prepareCall("call SumSet()");
    //执行存储过程
    st.execute();
    //获取结果集
    ResultSet rs = st.getResultSet();
    ...

替换成DifferenceSet.splx或IntersectionSet.splx是同样的道理，只需 call DifferenceSet()或者 call IntersectionSet() 即可。这里只用 Java 片段粗略解释了如何嵌入 SPL，详细步骤请参阅Java 如何调用 SPL 脚本，也非常简单，不再赘述。同时，SPL 也支持 ODBC 驱动，集成到支持 ODBC 的语言，嵌入过程类似。

扩展节选

关于集合运算除了上面讲的和差交运算，还可以获取与行号有关的计算，以及有序集合的对位运算。

根据行号取数据

示例 1： 计算招商银行 (600036) 2017 年第 3 个交易日和倒数第 3 个交易日的交易信息

MySQL8:

with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31')
    select tdate,open,close,volume from t where rn=3
union all
    select tdate,open,close,volume from t where rn=(select max(rn)-2 from t);

集算器 SPL:

	A
1	=connect("mysql")
2	=A1.query@x("select * from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31' order by tdate")
3	=A2(3)\|A2.m(-3)

A3: 第 3 条记录和倒数第 3 条记录的和集
004png

示例 2： 计算招商银行 (600036) 最近 20 个交易日的平均收盘价

MySQL8:

with t as (select *, row_number() over(order by tdate desc) rn from stktrade where sid='600036')
select avg(close) avg20 from t where rn<=20;

集算器 SPL:

	A
1	=connect("mysql")
2	=A1.query@x("select * from stktrade where sid='600036' order by tdate")
3	=A2.m(-20:)
4	=A3.avg(close)

A2: 将 600036 的交易记录按日期排序
A3: 取从倒数 20 条到末尾的所有记录
A4: 求 A3 中所有记录收盘价的平均值
005png

求满足条件的记录的行号

示例 1： 计算招商银行 (600036)2017 年经过多少交易日收盘价达到 25 元

MySQL8:

with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31')
select min(rn) from t where close>=25;

集算器 SPL:

	A
1	=connect("mysql")
2	=A1.query@x("select * from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31' order by tdate")
3	=A2.pselect(close>=25)

A3: 从前往后查找第 1 个收盘价达到 25 元的记录位置
006png

示例 2： 计算格力电器 (000651) 2017 年涨幅 (考虑停牌)

MySQL8:

with t as (select * from stktrade where sid='000651'),
     t1(d) as (select max(tdate) from t where tdate<'2017-01-01'),
     t2(d) as (select max(tdate) from t where tdate<'2018-01-01')
select s2.close/s1.close-1 rise
from (select * from t,t1 where tdate=d) s1,
     (select * from t,t2 where tdate=d) s2;

集算器 SPL:

	A
1	=connect("mysql")
2	=A1.query@x("select * from stktrade where sid='000651' and tdate<'2018-01-01' order by tdate")
3	=A2.pselect@z(tdate < date("2017-01-01"))
4	=A2(A3).close
5	=A2.m(-1).close
6	=A5/A4-1

A2: 数据按交易日从小到大排序
A3: 从后往前查找交易日在 2017-01-01 之前的最后一条记录在序列中的行号
A4: 求 2016 年收盘价
A5: 求 2017 年收盘价，其中 A2.m(-1) 取倒数第 1 条记录，即 2017 年最后一个交易日对应的记录
007png

示例 3： 列出 2017 年信息发展 (300469) 交易量超过 250 万股时的交易信息及各日涨幅(考虑停牌)

MySQL8:

with t as (select *, row_number() over(order by tdate) rn
    from stktrade where sid='300469' and tdate<=date '2017-12-31'),
t1 as (select * from t where tdate>=date'2017-01-01' and volume>=2500000)
    select t1.tdate, t1.close, t.volume, t1.close/t.close-1 rise
    from t1 join t on t1.rn=t.rn+1;

集算器 SPL:

	A
1	=connect("mysql")
2	=A1.query@x("select * from stktrade where sid='300469' and tdate<= date '2017-12-31' order by tdate")
3	=A2.pselect@a(tdate>=date("2017-01-01") && volume>2500000)
4	=A3.new(A2(~).tdate:tdate, A2(~).close:close, A2(~).volume:volume, A2(~).close/A2(~-1).close-1:rise)

A3: 求出 2017 年交易量超 250 万股所有记录的行号
A4: 根据行号计算相应的日期、收盘价、交易量、涨幅
008png

求最大值或最小值所在记录的行号

示例 1： 计算招商银行 (600036) 2017 年最早的最低价与最早的最高价间隔多少交易日

MySQL8:

with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31'),
     t1 as (select * from t where close=(select min(close) from t)),
     t2 as (select * from t where close=(select max(close) from t))
select abs(cast(min(t1.rn) as signed)-cast(min(t2.rn) as signed)) inteval
from t1,t2;

集算器 SPL:

	A
1	=connect("mysql")
2	=A1.query@x("select * from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31' order by tdate")
3	=A2.pmax(close)
4	=A2.pmin(close)
5	=abs(A3-A4)

A3: 从前往后找最大收盘价在序列中的行号
A4: 从前往后找最小收盘价在序列中的行号
009png

示例 2： 计算招商银行 (600036) 2017 年最后的最低价与最后的最高价间隔多少交易日

MySQL8:

with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31'),
     t1 as (select * from t where close=(select min(close) from t)),
     t2 as (select * from t where close=(select max(close) from t))
select abs(cast(max(t1.rn) as signed)-cast(max(t2.rn) as signed)) inteval
from t1,t2;

集算器 SPL:

	A
1	=connect("mysql")
2	=A1.query@x("select * from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31' order by tdate")
3	=A2.pmax@z(close)
4	=A2.pmin@z(close)
5	=abs(A3-A4)

A3: 从后往前找最大收盘价在序列中的行号
A4: 从后往前找最小收盘价在序列中的行号
010png

有序集合间的对位计算

示例 1： 求 2018 年 3 月 6 日到 8 日创业板指 (399006) 对深证成指 (399001) 的每日相对收益率

MySQL8:

with t1 as (select *,close/lag(close) over(order by tdate) rise from stktrade where sid='399006' and tdate between '2018-03-05' and '2018-03-08'),
     t2 as (select *, close/lag(close) over(order by tdate) rise from stktrade where sid='399001' and tdate between '2018-03-05' and '2018-03-08')
select t1.rise-t2.rise
from t1 join t2 using(tdate)
where t1.rise is not null;

集算器 SPL:

	A
1	=connect("mysql")
2	=["399006","399001"].(A1.query("select * from stktrade where sid=? and tdate between '2018-03-05' and '2018-03-08' ",~))
3	>A1.close()
4	=A2.(~.calc(to(2,4),close/close[-1]))
5	=A4(1)--A4(2)

A2: 依次查询 399006 和 399001 从 2018 年 3 月 5 日到 8 日的交易数据
A4: 依次计算 A2 中 2 个序表从第 2 条记录到第 4 条记录的涨幅，也就是 399006 和 399001 从 2018 年 3 月 6 日到 8 日的每天涨幅
A5: 对位相减，即可算出每日相对收益率
011png

SPL 优势

有库写 SQL，没库写 SPL

用 Java 程序直接汇总计算数据，还是比较累的，代码很长，并且不可复用，很多情况数据也不在数据库里，有了 SPL，就能像在 Java 中用 SQL 一样了，十分方便。

开源免费使用

集算器的社区版已经发布了，和 Java 应用程序集成部署到服务器可长期使用

技术文档和社区支持

官方提供的集算器技术文档本身就有很多现成的例子，常规问题从文档里都能找到解决方法。如果获得了入门版，不仅能够使用 SPL 的常规功能，碰到任何问题都可以去乾学院上去咨询，官方通过该社区对入门版用户提供免费的技术支持。

集算器(536) Java调用(8) 集合运算(9) 有序集合(3) 技术对比(26)

Java 搞定 SQL 集合运算的最简方法

问题介绍

SPL 实现

和集

差集

交集

Java 调用

扩展节选

根据行号取数据

求满足条件的记录的行号

求最大值或最小值所在记录的行号

有序集合间的对位计算

SPL 优势

目录