暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

MATCH_RECOGNIZE - OracleCODE 事件的欺诈演示

原创 fizz 2022-10-31
1207
  • 模块1

    概述

    在数据库 12c 中,有一个新的 SQL 构造MATCH_RECOGNIZE,用于使用正则表达式查找与行序列中的模式匹配的行。

    使用 SQL 的模式匹配基于四个逻辑概念:

    • 对数据/li进行逻辑分区和排序>
    • 使用正则表达式和模式变量/li定义模式>
    • 正则表达式与一系列行/li匹配>
    • 每个模式变量都是使用行和聚合的条件定义的


    业务问题:查找可疑汇款

    在本教程中,我们将探讨如何使用MATCH_RECOGNIZE来查找潜在的欺诈易。

    让我们将账户的可疑汇款模式定义为:


    • 30天内进行3次或更多小额(<2K)汇款
    • 最后一次小额转账后 10 天内的大转账 (>=1M)

    当我们发现这种模式时,我们需要报告以下信息:账户、第一次小额转账的日期、最后一次大额转账的日期和大额转账的金额。

  • 模块2

    设置

    第一步是创建将保存事务日志的 JSON 表。日志将提供时间,帐户/用户ID,事件类型(存款或转账)和交易金额的详细信息。

    Oracle Database 12c 支持在数据库中存储 JSON 文档。使用以下代码创建一个表来存储 JSON 格式的事务日志

    CREATE TABLE json_transactions 
    (transaction_doc CLOB, 
     CONSTRAINT "VALID_JSON" CHECK (transaction_doc IS JSON) ENABLE
    );


  • 模块3

    添加数据

    下一步是使用键值对的常规 JSON 表示法向我们的 JSON 表添加一些数据。

    INSERT INTO json_transactions VALUES ('{"time_id":"01-JAN-12","user_id":"John","event_id":"Deposit","trans_amount":1000000}');
    INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-12","user_id":"John","event_id":"Transfer","trans_amount":1000}');
    INSERT INTO json_transactions VALUES ('{"time_id":"05-JAN-12","user_id":"John","event_id":"Withdrawal","trans_amount":2000}');
    INSERT INTO json_transactions VALUES ('{"time_id":"10-JAN-12","user_id":"John","event_id":"Transfer","trans_amount":1500}');
    INSERT INTO json_transactions VALUES ('{"time_id":"20-JAN-12","user_id":"John","event_id":"Transfer","trans_amount":1200}');
    INSERT INTO json_transactions VALUES ('{"time_id":"25-JAN-12","user_id":"John","event_id":"Deposit","trans_amount":1200000}');
    INSERT INTO json_transactions VALUES ('{"time_id":"27-JAN-12","user_id":"John","event_id":"Transfer","trans_amount":1000000}');
    INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-12","user_id":"John","event_id":"Deposit","trans_amount":500000}');
    COMMIT;


  • 模块4

    查看交易数据

    使用新的 JSON SQL 表示法,我们可以使用简单的 SELECT 语句查询数据

    SELECT 
      j.transaction_doc.time_id as time_id, 
      j.transaction_doc.user_id as user_id,
      j.transaction_doc.event_id as event_id,   
      j.transaction_doc.trans_amount as amount
    FROM json_transactions j;


    以下是我们感兴趣的交易列表,以供我们分析。

    SELECT 
           TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
           j.transaction_doc.user_id as user_id,
           j.transaction_doc.event_id as event_id,   
           to_number(j.transaction_doc.trans_amount) as amount
         FROM json_transactions j
         WHERE  j.transaction_doc.event_id = 'Transfer';


  • 模块5

    查找欺诈交易

    MATCH_RECOGNIZE的输入是一个 SELECT 语句,该语句以标准关系表格式返回 JSON 数据

    SELECT user_id, first_t, last_t, big_amount 
    FROM (SELECT 
           TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
           j.transaction_doc.user_id as user_id,
           j.transaction_doc.event_id as event_id,   
           j.transaction_doc.transfer_id as transfer_id,   
           TO_NUMBER(j.transaction_doc.trans_amount) as amount
         FROM json_transactions j
         WHERE  j.transaction_doc.event_id = 'Transfer')
    MATCH_RECOGNIZE(
     PARTITION BY user_id 
     ORDER BY time_id
     MEASURES 
       FIRST(x.time_id) AS first_t, 
       LAST(y.time_id) AS last_t, 
       y.amount AS big_amount
     ONE ROW PER MATCH
     PATTERN (X{3,} Y)
     DEFINE
       X as (amount < 2000) AND 
             LAST(time_id) - FIRST(time_id) < 30,
       Y as (amount >= 1000000) AND 
             time_id - LAST(x.time_id) < 10); 


    此声明如何分解?

  • 模块6

    分解查询

    设置 PARTITION BY 和 ORDER BY 子句

    需要对数据进行分组和排序以使模式在行序列中“可见”...,因此,我们需要按每个user_id对数据进行分组和分区,然后按交易日期对每个分区内的数据进行排序。

    定义输出度量

    MEASURES子句允许我们列出将返回的列。这些列可以是输入表/视图和/或计算列中的现有列。在这种情况下,我们需要返回帐户/用户ID以及大额金额的值,并且我们需要计算第一次小额转账的日期和最后一次大额转账的日期

    报告类型

    对于此查询,我们只需要一个摘要报告,因此使用 ONE ROW PER MATCH 子句。

    模式子句

    我们有两个模式变量:X 和 Y。在此示例中,我们搜索三次或更多次出现的 X,然后搜索一次出现的 Y。

    描述每个模式变量

    对于变量 X 上的匹配,我们正在搜索小于 2K 的小额转账,并且所有三次转账都必须在 30 天内发生

    对于变量 Y 上的匹配,我们正在寻找超过 10K 的大传输。大额转账必须在最后一次小额转账后的 10 天内进行。

  • 模块7

    正确应用了检查模式

    我们可以通过使用两个内置度量值来检查 out 模式是否找到了所有相关匹配项:

    • MATCH_NUMBER()
    • 分类

    MATCH_NUMBER()

    在给定的行分区中,您的模式可能有大量匹配项。你如何区分这些比赛?这是通过 MATCH_NUMBER() 函数完成的。行模式分区中的匹配项按找到顺序从 1 开始编号。在每个行模式分区中,此编号从 1 开始,因为行模式分区之间没有链接排序。

    分类器()

    除了知道您看到的是哪个MATCH_NUMBER之外,您可能还想知道模式的哪个组件适用于特定行。这是使用 CLASSIFIER() 函数完成的。行的分类器是行模式匹配映射到的模式变量。CLASSIFIER() 函数返回一个字符串,其值是在 PATTERN 子句中定义的模式变量的名称。

    The last step we need to do is change the amount of information returned from summary to detailed. Therefore, we need to use ALL ROWS PER MATCH WITH UNMATCHED ROW - there is a separate livesql tutorial that covers this topic

    SELECT user_id, time_id, amount, first_t, last_t, first_small_amount, last_small_amount, big_amount, mn, classifier 
    FROM (SELECT 
           TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
           j.transaction_doc.user_id as user_id,
           j.transaction_doc.event_id as event_id,   
           j.transaction_doc.transfer_id as transfer_id,   
           TO_NUMBER(j.transaction_doc.trans_amount) as amount
         FROM json_transactions j
         WHERE  j.transaction_doc.event_id = 'Transfer')
    MATCH_RECOGNIZE(
     PARTITION BY user_id 
     ORDER BY time_id
     MEASURES 
       FIRST(x.time_id) AS first_t, 
       LAST(y.time_id) AS last_t,
       FIRST(x.amount) AS first_small_amount,
       LAST(x.amount) AS last_small_amount,  
       y.amount AS big_amount,
       match_number() AS mn,
       classifier() AS classifier
     ALL ROWS PER MATCH WITH UNMATCHED ROWS
     PATTERN (X{3,} Y)
     DEFINE
       X as (amount < 2000) AND 
             LAST(time_id) - FIRST(time_id) < 30,
       Y as (amount >= 1000000) AND 
             time_id - LAST(x.time_id) < 10); 


    As you can now see, we have correctly identified the three small transfers on 02-Jan, 10-Jan and 20-Jan, the difference between 02-Jan and 20-Jan is less than 30 days. The large transfer is within our range of 10,000 or greater and the time difference between the last small transfer on Jan 20 and the large transfer on 27-Jan is less than 10 days.

  • Module8

    Changing the Requirements

    Having found some fraudulent transfers the business has now come back with an additional requirement:


    1. 检查不同账户之间的转账情况
    2. 小额转账的总和必须小于20K

    应用程序开发人员已修改 JSON 模型以包含接收传输的人员的 ID,因此让我们使用此新信息更新我们的表:

    TRUNCATE TABLE json_transactions;
    INSERT INTO json_transactions VALUES ('{"time_id":"01-JAN-12","user_id":"John","event_id":"Deposit","trans_amount":1000000}');
    INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-12","user_id":"John","event_id":"Transfer","transfer_id":"John","trans_amount":100}');
    INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-12","user_id":"John","event_id":"Transfer","transfer_id":"Bob","trans_amount":1000}');
    INSERT INTO json_transactions VALUES ('{"time_id":"05-JAN-12","user_id":"John","event_id":"Withdrawal","trans_amount":2000}');
    INSERT INTO json_transactions VALUES ('{"time_id":"10-JAN-12","user_id":"John","event_id":"Transfer","transfer_id":"Allen","trans_amount":1500}');
    INSERT INTO json_transactions VALUES ('{"time_id":"20-JAN-12","user_id":"John","event_id":"Transfer","transfer_id":"Tim","trans_amount":1200}');
    INSERT INTO json_transactions VALUES ('{"time_id":"25-JAN-12","user_id":"John","event_id":"Deposit","trans_amount":1200000}');
    INSERT INTO json_transactions VALUES ('{"time_id":"27-JAN-12","user_id":"John","event_id":"Transfer","transfer_id":"Tim","trans_amount":1000000}');
    INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-12","user_id":"John","event_id":"Deposit","trans_amount":500000}');
    COMMIT;


  • 模块9

    查看新数据集

    和以前一样,我们可以使用新的 JSON SQL 表示法通过简单的 SELECT 语句来查询我们的数据

    SELECT 
      j.transaction_doc.time_id as time_id, 
      j.transaction_doc.user_id as user_id,
      j.transaction_doc.event_id as event_id,   
      j.transaction_doc.transfer_id as transfer_id,   
      j.transaction_doc.trans_amount as amount
    FROM json_transactions j;


    以下是我们感兴趣的分析转移交易列表,其中包括新列:transfer_id

    SELECT 
     TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
     j.transaction_doc.user_id as user_id,
     j.transaction_doc.event_id as event_id,   
     j.transaction_doc.transfer_id as transfer_id,   
     to_number(j.transaction_doc.trans_amount) as amount
    FROM json_transactions j
    WHERE  j.transaction_doc.event_id = 'Transfer';


  • 模块10

    以前的查询是否仍在运行?

    现在我们已经修改了原始 JSON 模型,现在包含接收传输的人员的 ID,仍然可以运行基于原始业务规则的原始模式匹配查询:

    SELECT user_id, first_t, last_t, big_amount 
    FROM (SELECT 
           TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
           j.transaction_doc.user_id as user_id,
           j.transaction_doc.event_id as event_id,   
           j.transaction_doc.transfer_id as transfer_id,   
           TO_NUMBER(j.transaction_doc.trans_amount) as amount
         FROM json_transactions j
         WHERE  j.transaction_doc.event_id = 'Transfer')
    MATCH_RECOGNIZE(
     PARTITION BY user_id 
     ORDER BY time_id
     MEASURES 
       FIRST(x.time_id) AS first_t, 
       LAST(y.time_id) AS last_t, 
       y.amount AS big_amount
     ONE ROW PER MATCH
     PATTERN (X{3,} Y)
     DEFINE
       X as (amount < 2000) AND 
             LAST(time_id) - FIRST(time_id) < 30,
       Y as (amount >= 1000000) AND 
             time_id - LAST(x.time_id) < 10); 


    如我们所见,即使我们在 JSON 数据中添加了新字段,我们的原始查询实际上仍在运行。

  • 模块11

    使用新的业务规则

    让我们更改两个模式变量的定义以包含新的业务需求:

    1)检查不同帐户之间的转账

    上一篇(transfer_id) <> transfer_id

    2)小额转账的总和必须小于20K

    总和(x.金额) < 20000)
    SELECT user_id, first_t, last_t, big_amount 
    FROM (SELECT 
           TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
           j.transaction_doc.user_id as user_id,
           j.transaction_doc.event_id as event_id,   
           j.transaction_doc.transfer_id as transfer_id,   
           TO_NUMBER(j.transaction_doc.trans_amount) as amount
         FROM json_transactions j
         WHERE  j.transaction_doc.event_id = 'Transfer')
    MATCH_RECOGNIZE(
     PARTITION BY user_id 
     ORDER BY time_id
     MEASURES 
       FIRST(x.time_id) AS first_t, 
       LAST(y.time_id) AS last_t, 
       y.amount AS big_amount
     ONE ROW PER MATCH
     PATTERN (X{3,} Y)
     DEFINE
       X as (amount < 2000) AND 
             PREV(transfer_id) <> transfer_id AND
             LAST(time_id) - FIRST(time_id) < 30,
       Y as (amount >= 1000000) AND 
             time_id - LAST(x.time_id) < 10 AND
             SUM(x.amount) < 20000);


  • Module12

    Extracting more value...

    We can extract a lot more information from our pattern using some of the built-in functions. For example we can extract data for each of the three small transfers by using the FIRST() and LAST() functions. Notice that we can use the FIRST(value, x) syntax to extract information about the 2nd transfer.

    Therefore, FIRST(x.transfer_id,1) will return the transfer_id for the second match.

    SELECT user_id, first_t, amount_1, transfer_1, amount_2, transfer_2, amount_3, transfer_3, last_t, big_amount,transfer_4 
    FROM (SELECT 
           TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
           j.transaction_doc.user_id as user_id,
           j.transaction_doc.event_id as event_id,   
           j.transaction_doc.transfer_id as transfer_id,   
           TO_NUMBER(j.transaction_doc.trans_amount) as amount
         FROM json_transactions j
         WHERE  j.transaction_doc.event_id = 'Transfer')
    MATCH_RECOGNIZE(
     PARTITION BY user_id 
     ORDER BY time_id
     MEASURES 
       FIRST(x.time_id) AS first_t, 
       LAST(y.time_id) AS last_t, 
       FIRST(x.amount) AS amount_1,
       FIRST(x.transfer_id) AS transfer_1,
       FIRST(x.amount,1) AS amount_2,
       FIRST(x.transfer_id,1) AS transfer_2,
       LAST(x.amount) AS amount_3,
       LAST(x.transfer_id) AS transfer_3,
       y.amount AS big_amount,
       y.transfer_id AS transfer_4
     ONE ROW PER MATCH
     PATTERN (X{3,} Y)
     DEFINE
       X as (amount < 2000) AND 
             PREV(transfer_id) <> transfer_id AND
             LAST(time_id) - FIRST(time_id) < 30,
       Y as (amount >= 1000000) AND 
             time_id - LAST(x.time_id) < 10 AND
             SUM(x.amount) < 20000);


  • 模块13

    总结

    本教程介绍了如何使用新的 JSON 功能将数据存储在数据库中,并使用熟悉的 SQL 语法进行查询。

    我们基于一组业务规则设计了欺诈模式匹配SQL语句,然后针对不断变化的业务规则增强了模式匹配SQL语句。

    最后,我们使用了一些内置的方法来验证我们的模式匹配语句是否正常工作(使用 MATCH_NUMBER() 和 CLASSIFIER() 函数)以及 FIRST() 和 LAST() 函数来检索模式中的特定数据点。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论