从 SQL 到 SPL:Find consecutive alphabetical characters in string

某库表的字段是字符串。

VALUE

Test

Test1

Tesssst

TTTTest

ABCDTest

Testuvwxyz

现在要找出至少含有 5 个连续按字母表排序的字母的字符串,或者说,连续升序排列的子串中,最长的子串的长度大于等于 5 的字符串。注意,排除掉非字母的字符。

VALUE

ABCDTest

Testuvwxyz

SQL

SELECT value
FROM   table_name t
       CROSS JOIN LATERAL (
         SELECT MIN(lvl) AS start_pos
         FROM   (
           SELECT LEVEL AS lvl,
                  SUBSTR(t.value, LEVEL, 1) AS ch
           FROM   DUAL
           CONNECT BY LEVEL <= LENGTH(t.value)
         )
         MATCH_RECOGNIZE(
           ORDER BY lvl
           MEASURES
             FIRST(lvl) AS lvl
           PATTERN (first_row consecutive{3,})
           DEFINE first_row   AS (ch BETWEEN 'A' AND 'Z' OR ch BETWEEN 'a' AND 'z'),
                  consecutive AS (ch BETWEEN 'A' AND 'Z' OR ch BETWEEN 'a' AND 'z')
                                 AND ASCII(PREV(ch)) + 1 = ASCII(ch)
         )
         HAVING MIN(lvl) > 0
       )

SQL要用嵌套子查询 +CONNECT BY 模拟序列,再用 MATCH_RECOGNIZE 处理序列,代码繁琐难懂。SPL 直接提供了序列计算函数:https://try.esproc.com/splx?4yy


 A

1

$select * from table_name.txt

2

=A1. select(VALUE.split().group@i(~<=~[-1] || !isalpha(~[-1])).max(~.len())>=5)

A1:加载数据。

A2:将字符串按字符拆成序列,对序列分组,当当前成员小于等于上一个成员或者上个成员非字母时,开始新的分组,选出最长的组的长度大于等于 5 的字符串。函数 group 在分组后可以保持分组子集用于后续处理,而不是立即计算,@i 表示按条件分组。

问题来源:https://stackoverflow.com/questions/78151968/find-consecutive-alphabetical-characters-in-string