暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

使用Oracle构建神经网络机器学习模型

ASKTOM 2021-02-10
653

问题描述

你好!

我对使用Oracle DBMS_DATA_MINING构建NN感兴趣,但是构建此类复杂模型时遇到了一些技术困难。
主要问题是在分析中需要组合多列。

我将提供一个我想要实现的目标的示例:
_

-- Create a base table to score and label data, we will create a NN to Analyse customers purchases in order to find Patterns in data. 
create table customer_group_analysis ( age number(4), City varchar2(24), education_level varchar2(24), total_purchase number(24) );

insert into customer_group_analysis values (18,'London','Student', 400);
insert into customer_group_analysis values (28,'Leeds','Bachelor Degree', 1400);
insert into customer_group_analysis values (54,'Liverpool','Master Degree', 2400);
insert into customer_group_analysis values (16,'London','Student', 150);
insert into customer_group_analysis values (33,'London','PHD', 3400);
insert into customer_group_analysis values (28,'Leeds','Bachelor Degree', 2000);

commit;

-- Create Settings table for NN

create table demo_nn_settings (Setting_name varchar2(30), Setting_value varchar2(4000));

begin
  insert into demo_nn_settings (setting_name, setting_value)
  values (dbms_data_mining.algo_name, dbms_data_mining.algo_neural_network);
  
  insert into demo_nn_settings (setting_name, setting_value)
  values (dbms_data_mining.prep_auto, dbms_data_mining.prep_auto_on);
  
  insert into demo_nn_settings (setting_name, setting_value)
  values (dbms_data_mining.nnet_nodes_per_layer, '10,6,4');
end;
/
commit;

-- Create the Model 

BEGIN
   DBMS_DATA_MINING.CREATE_MODEL(
      model_name          => 'DEMO_NEURAL_NETWORK_MODEL',
      mining_function     =>  dbms_data_mining.classification,
      data_table_name     => 'customer_group_analysis',
      case_id_column_name => 'age,city,education',   -- This will throw an error! but we Need that the case_id_columns contains all required columns for the Analysis 
      target_column_name  => 'total_purchase',
      settings_table_name => 'demo_nn_settings');
END;
/
 
-- Create view to label the data

CREATE OR REPLACE VIEW NN_VIEW
AS
SELECT age,city,education,
       prediction(DEMO_NEURAL_NETWORK_MODEL USING *)  predicted_value,
       prediction_probability(DEMO_NEURAL_NETWORK_MODEL USING *) probability
FROM   customer_group_analysis;

-- Create the confusion Matrix
DECLARE
   v_accuracy NUMBER;
BEGIN
   DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX (
      accuracy           => v_accuracy,
      apply_result_table_name      => 'NN_VIEW',
      target_table_name       => 'customer_group_analysis',
      case_id_column_name       => 'age,city,education',
      target_column_name       => 'total_purchase',
      confusion_matrix_table_name => 'NN_VIEW_confusion_matrix',
      score_column_name       => 'PREDICTED_VALUE',
      score_criterion_column_name => 'PROBABILITY',
      cost_matrix_table_name      => null,
      apply_result_schema_name    => null,
      target_schema_name       => null,
      cost_matrix_schema_name     => null,
      score_criterion_type       => 'PROBABILITY');
   DBMS_OUTPUT.PUT_LINE('**** MODEL ACCURACY ****: ' || ROUND(v_accuracy,4));
END;
/
复制


模型的创建将失败,因为我在 “case_id_column_name” 中定义了三列。

不幸的是,我找不到解决这个问题的方法,你能帮我在模型定义中的 “案例 _ id _ 列 _ 名” 中定义多个列吗?

感谢您的帮助!!


问候

专家解答

case_id_column_name是每个客户组的唯一标识符。这在表中不一定是唯一的,但是您希望属于同一案例 (客户组) 的所有行都具有相同的值。

一种简单的方法是创建一个包含rank/dense_rank列的视图,并将其传递给模型而不是基表。

例如:

create or replace view cust_cases as
  select rank() over ( order by age, city, education_level ) case_id,
         c.*
  from   customer_group_analysis c;

select * from cust_cases;

CASE_ID    AGE    CITY         EDUCATION_LEVEL       TOTAL_PURCHASE   
         1     16 London       Student                          150 
         2     18 London       Student                          400 
         3     28 Leeds        Bachelor Degree                 1400 
         3     28 Leeds        Bachelor Degree                 2000 
         5     33 London       PHD                             3400 
         6     54 Liverpool    Master Degree                   2400 
复制


这样做的缺点是,如果您从表中添加/删除行,case_ids将会改变。另一种方法是将case_id列添加到表本身,然后对其进行设置。

由于您有重复的行 (年龄,城市,教育),因此管理起来可能很棘手。

您可以通过将唯一属性拆分为标题表并将其他属性放入子表来简化此操作。

例如:

create table customer_group_header ( 
  case_id integer 
    generated as identity
    primary key,
  age    number(4), City varchar2(24), education_level varchar2(24),
  unique ( age, city, education_level )
);

create table customer_group_details ( 
  case_id references customer_group_header ( case_id ),
  total_purchase number(24) 
);
复制


并再次构建一个将这两个连接起来的视图,并将其传递给模型。
文章转载自ASKTOM,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论