6.1 定位运算

 

5.3 分组迭代


6.1.1 成员定位

1. 获取 5 在序列中的位置

2. 判断 5 是否在序列中

SPL

A B
1 [2,9,7,7,2,2,9,3,3,5]
2 =A1.pos(5) /10
3 =A1.pos(5)>0 /true

SQL

1. 获取 5 在序列中的位置

SELECT position FROM (
    SELECT COLUMN_VALUE AS num, ROWNUM position
    FROM TABLE(SYS.ODCINUMBERLIST(2,9,7,7,2,2,9,3,3,5)))
WHERE num = 5;

2. 判断 5 是否在序列中

SELECT SIGN(COUNT(*)) AS result 
FROM TABLE(SYS.ODCINUMBERLIST(2,9,7,7,2,2,9,3,3,5))
WHERE COLUMN_VALUE = 5;

Python

sequence = np.array([2, 9, 7, 7, 2, 2, 9, 3, 3, 5])
indices = np.where(sequence == 5)[0]        #9
is_present = len(np.where(sequence == 5)[0])!=0         #True

6.1.2 分段定位

定位指定日期属于第几段

SPL

A B
1 [1970-01-01,1980-01-01,1990-01-01,2000-01-01,2010-01-01,2020-01-01]
2 1965-02-03
3 1979-12-31
4 1980-01-01
5 2005-08-09
6 2023-03-23
7 =A1.pseg(A2) /0
8 =A1.pseg(A3) /1
9 =A1.pseg(A4) /2
10 =A1.pseg(A5) /4
11 =A1.pseg(A6) /6
12 =[A2:A6].(A1.pseg(~)) /[0,1,2,4,6]

pos()和 pseg()都不是循环函数,所以本例中的 A12 中的 ~ 是 [A2:A6] 中的成员。

SQL

SQL 没有提供特别便捷的方法。

Python

from datetime import datetime
import bisect
date_list = ["1970-01-01","1980-01-01","1990-01-01","2000-01-01","2010-01-01","2020-01-01"]
date_list = [datetime.strptime(date, "%Y-%m-%d") for date in date_list]
date1 = datetime.strptime("1965-02-03", "%Y-%m-%d")
index1 = bisect.bisect_right(date_list, date1) #0
date2 = datetime.strptime("1979-12-31", "%Y-%m-%d")
index2 = bisect.bisect_right(date_list, date2) #1
date3 = datetime.strptime("1980-01-01", "%Y-%m-%d")
index3 = bisect.bisect_right(date_list, date3) #2
date4 = datetime.strptime("2005-08-09", "%Y-%m-%d")
index4 = bisect.bisect_right(date_list, date4) #4
date5 = datetime.strptime("2023-03-23", "%Y-%m-%d")
index5 = bisect.bisect_right(date_list, date5) #6

6.1.3 条件定位

查看销售部门员工在员工表中的位置

SPL

A
1 =file(“EMPLOYEE.csv”).import@tc()
2 =A1.pselect@a(DEPT==“Sales”)

pselect() 只选出第一个满足条件的成员位置,@a 选项会返回所有满足条件的成员位置。

SQL

SELECT rn FROM (
    SELECT EMPLOYEE.*, ROWNUM AS rn FROM EMPLOYEE) 
WHERE DEPT = 'Sales';

Python

df = pd.read_csv('../EMPLOYEE.csv')
sales_employees_index = np.where(df['DEPT'] == 'Sales')[0]

6.1.4 普通聚合定位

查看工资最高员工在员工表中的位置

SPL

A B
1 =file(“EMPLOYEE.csv”).import@tc()
2 =A1.pmax(SALARY) /1 个成员的位置
3 =A1.pmax@a(SALARY) / 全部成员的位置

SQL

1. 1 个成员的位置

SELECT rn FROM (
    SELECT EMPLOYEE.*, ROWNUM AS rn
    FROM EMPLOYEE
    ORDER BY SALARY DESC)
FETCH FIRST 1 ROWS ONLY;

2. 全部成员的位置

SELECT rn FROM (
    SELECT EMPLOYEE.*, ROWNUM AS rn
    FROM EMPLOYEE)
WHERE SALARY=(SELECT MAX(SALARY) FROM EMPLOYEE);

Python

df = pd.read_csv('../EMPLOYEE.csv')
#1个成员的位置
highest_salary_employee_index_first = np.argmax(df['SALARY'])
#全部成员的位置
highest_salary_employee_index_all = np.where(df['SALARY'] == df['SALARY'].max())[0]

Python 选出 1 个成员位置和多个成员位置是完全不同的方法,选出多个位置时需要遍历两遍,一遍是计算最大值,另一遍是找到索引。

6.1.5 集合聚合定位

查看工资最高的 10 个员工在员工表中的位置

SPL

A
1 =file(“EMPLOYEE.csv”).import@tc()
2 =A1.ptop(-10,SALARY)

SQL

SELECT rn FROM (
    SELECT EMPLOYEE.*, ROWNUM AS rn
    FROM EMPLOYEE
    ORDER BY SALARY DESC)
FETCH FIRST 10 ROWS ONLY;

Python

df = pd.read_csv('../EMPLOYEE.csv')
top_10_positions = np.argsort(-df['SALARY'])[:10]

Pandas 是排序后取前 10 的方法。


6.2 选出运算
SPL SQL Python 代码示例对比