.sig_test.split_test in Boston Housing price regression datasetΒΆ

[1]:
import numpy as np
import tensorflow.keras as keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from tensorflow.python.keras import backend as K
import time
from sklearn.model_selection import train_test_split
from tensorflow.keras.optimizers import Adam, SGD
from dnn_inference.sig_test import split_test
import tensorflow as tf

Boston house prices dataset

Data Set Characteristics:

Number of Instances: 506

Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

Attribute Information (in order)
    - CRIM     per capita crime rate by town
    - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
    - INDUS    proportion of non-retail business acres per town
    - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
    - NOX      nitric oxides concentration (parts per 10 million)
    - RM       average number of rooms per dwelling
    - AGE      proportion of owner-occupied units built prior to 1940
    - DIS      weighted distances to five Boston employment centres
    - RAD      index of accessibility to radial highways
    - TAX      full-value property-tax rate per USD10,000
    - PTRATIO  pupil-teacher ratio by town
    - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
    - LSTAT    % lower status of the population
    - MEDV     Median value of owner-occupied homes in $1000's
[2]:
(x_train, y_train), (_, _) = tf.keras.datasets.boston_housing.load_data(path="boston_housing.npz",
                                                                                  test_split=0.1)
y_train, y_test = y_train[:,np.newaxis], y_test[:,np.newaxis]

from sklearn import preprocessing
scaler = preprocessing.MinMaxScaler()
x_train = scaler.fit_transform(x_train)
[3]:
n, d = x_train.shape
print('num of samples: %d, dim: %d' %(n, d))
num of samples: 455, dim: 13
[4]:
from tensorflow import keras
from tensorflow.keras import layers

def build_model():
  model = keras.Sequential([
    layers.Dense(8, activation='relu', input_shape=[d]),
    layers.BatchNormalization(),
    layers.Dense(1, activation='relu')
  ])

  optimizer = tf.keras.optimizers.Adam(1e-3)

  model.compile(loss='mse',
                optimizer=optimizer,
                metrics=['mae', 'mse'])
  return model
[5]:
model_null, model_alter = build_model(), build_model()
2022-06-30 12:53:52.290176: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 12:53:52.295414: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 12:53:52.295682: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 12:53:52.296167: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-30 12:53:52.296992: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 12:53:52.297257: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 12:53:52.297500: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 12:53:52.641227: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 12:53:52.641520: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 12:53:52.641753: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 12:53:52.642007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3022 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5
[6]:
from tensorflow.keras.callbacks import EarlyStopping
es = EarlyStopping(monitor='val_loss', mode='min',
                                    verbose=0, patience=300, restore_best_weights=True)

fit_params = {'callbacks': [es],
                      'epochs': 3000,
                      'batch_size': 32,
                      'validation_split': .2,
                      'verbose': 0}

## testing params
test_params = { 'split': "one-split",
                'inf_ratio': None,
                'perturb': None,
                'cv_num': 2,
                'cp': 'hommel',
                'verbose': 2}

## tuning params
tune_params = { 'num_perm': 100,
                'ratio_grid': [.2, .4, .6, .8],
                'if_reverse': 1,
                'perturb_range': 2.**np.arange(-3,3,.3),
                'tune_ratio_method': 'fuse',
                'tune_pb_method': 'fuse',
                'cv_num': 2,
                'cp': 'hommel',
                'verbose': 2}

[7]:
inf_feats = [np.arange(3), np.arange(5,11)]

cue = split_test(inf_feats=inf_feats, model_null=model_null, model_alter=model_alter, eva_metric='mse')

P_value = cue.testing(x_train, y_train, fit_params, test_params, tune_params)
INFO:tensorflow:Assets written to: ./saved/split_test/06-30_12-53/model_null_init/assets
2022-06-30 12:53:59.491087: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: ./saved/split_test/06-30_12-53/model_alter_init/assets
==================== one-split test for 0-th Hypothesis ====================
(tuneHP: ratio) Est. Type 1 error: 0.150; inf sample ratio: 0.800
(tuneHP: ratio) Est. Type 1 error: 0.270; inf sample ratio: 0.600
(tuneHP: ratio) Est. Type 1 error: 0.000; inf sample ratio: 0.400
βœ… (tuneHP: ratio) Done with inf sample ratio: 0.400
(tuneHP: pb) Est. Type 1 error: 0.000; perturbation level: 0.125
βœ… (tuneHP: pb) Done with inf pb level: 0.125
cv: 0; p_value: 0.03247; loss_null: 14.63722(51.76234); loss_alter: 18.41512(42.01610)
cv: 1; p_value: 0.01776; loss_null: 14.65350(46.57168); loss_alter: 21.16354(51.60448)
 πŸ§ͺ 0-th Hypothesis: reject H0 with p_value: 0.049
==================== one-split test for 1-th Hypothesis ====================
(tuneHP: ratio) Est. Type 1 error: 1.000; inf sample ratio: 0.800
(tuneHP: ratio) Est. Type 1 error: 0.000; inf sample ratio: 0.600
βœ… (tuneHP: ratio) Done with inf sample ratio: 0.600
(tuneHP: pb) Est. Type 1 error: 0.700; perturbation level: 0.125
(tuneHP: pb) Est. Type 1 error: 0.730; perturbation level: 0.154
(tuneHP: pb) Est. Type 1 error: 0.700; perturbation level: 0.189
(tuneHP: pb) Est. Type 1 error: 0.700; perturbation level: 0.233
(tuneHP: pb) Est. Type 1 error: 0.610; perturbation level: 0.287
(tuneHP: pb) Est. Type 1 error: 0.580; perturbation level: 0.354
(tuneHP: pb) Est. Type 1 error: 0.470; perturbation level: 0.435
(tuneHP: pb) Est. Type 1 error: 0.460; perturbation level: 0.536
(tuneHP: pb) Est. Type 1 error: 0.390; perturbation level: 0.660
(tuneHP: pb) Est. Type 1 error: 0.290; perturbation level: 0.812
(tuneHP: pb) Est. Type 1 error: 0.180; perturbation level: 1.000
(tuneHP: pb) Est. Type 1 error: 0.190; perturbation level: 1.231
(tuneHP: pb) Est. Type 1 error: 0.170; perturbation level: 1.516
(tuneHP: pb) Est. Type 1 error: 0.070; perturbation level: 1.866
(tuneHP: pb) Est. Type 1 error: 0.070; perturbation level: 2.297
(tuneHP: pb) Est. Type 1 error: 0.030; perturbation level: 2.828
βœ… (tuneHP: pb) Done with inf pb level: 2.828
cv: 0; p_value: 0.12088; loss_null: 31.87867(120.14606); loss_alter: 34.30483(86.49743)
cv: 1; p_value: 0.83584; loss_null: 33.45124(124.96815); loss_alter: 33.72868(86.60861)
 πŸ§ͺ 1-th Hypothesis: accept H0 with p_value: 0.363
[9]:
P_value
[9]:
[0.048711537430423474, 0.3626513098329711]