Python-API

API reference for DnnT class

class dnn_inference.DnnT(inf_feats, model, model_mask, change='mask', alpha=0.05, verbose=0, eva_metric='mse', cp_path='./dnnT_checkpoints')[source]

Class for one-split/two-split test based on deep neural networks.

Parameters:
  • inf_feats (list-like | shape = (num of tests, dim of features)) – List of covariates/Features under hypothesis testings, one element corresponding to a hypothesis testing.
  • model ({keras-defined neural network}) – A neural network for original full dataset
  • model_mask ({keras-defined neural network}) – A neural network for masked dataset by masking/changing the features under hypothesis testing
  • change ({'mask', 'perm'}, default='mask') – The way to change the testing features, 'mask' replaces testing features as zeros, while 'perm' permutes features via instances.
  • alpha (float (0,1), default=0.05) – The nominal level of the hypothesis testing
  • verbose ({0, 1}, default=0) – If print the testing results, 1 indicates YES, 0 indicates NO.
  • eva_metric ({'mse', 'zero-one', 'cross-entropy', or custom metric function}) – The evaluation metric, 'mse' is the l2-loss for regression, 'zero-one' is the zero-one loss for classification, 'cross-entropy' is log-loss for classification. It can also be custom metric function as eva_metric(y_true, y_pred).
  • cp_path (string, default='./dnnT_checkpoints') – The checkpoints path to save the models
adaRatio(X, y, k=0, fit_params={}, perturb=None, split='one-split', perturb_grid=[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0], ratio_grid=[0.2, 0.4, 0.6, 0.8], if_reverse=0, min_inf=0, min_est=0, ratio_method='fuse', num_perm=100, cv_num=1, cp='hommel', verbose=1)[source]

Return a data-adaptive splitting ratio and perturbation level.

Parameters:
  • X (array-like | shape=(n_samples, dim1, dim2, ..)) – Features.
  • y (array-like | shape=(n_samples, dim)) – Outcomes.
  • k (integer, default = 0) – k-th hypothesized features in inf_feats
  • fit_params (dict | shape = dict of fitting parameters) – See keras fit: (https://keras.rstudio.com/reference/fit.html), including batch_size, epoch, callbacks, validation_split, validation_data.
  • perturb (float | default=None) – Perturb level for the one-split test, if perturb = None, then the perturb level is determined by adaptive tunning.
  • split ({'one-split', 'two-split'}) – one-split or two-split test statistic.
  • perturb_grid (list of float | default=[01, 05, 1, 5, 1.]) – A list of perturb levels under searching.
  • ratio_grid (list of float (0,1) | default=[2, 4, 6, 8]) – A list of estimation/inference ratios under searching.
  • if_reverse ({0,1} | default = 0) – if_reverse = 0 indicates the loop of ratio_grid starts from smallest one to largest one; if_reverse = 1 indicates the loop of ratio_grid starts from largest one to smallest one.
  • min_inf (integer | default = 0) – The minimal size for inference sample.
  • min_est (integer | default = 0) – The minimal size for estimation sample.
  • ratio_method ({'close', 'fuse'} | default = 'fuse') – The adaptive splitting method to determine the optimal estimation/inference ratios.
  • cv_num (int, default=1) – The number of cross-validation to shuffle the estimation/inference samples in adaptive ratio splitting.
  • cp ({'gmean', 'min', 'hmean', 'Q1', 'hommel', 'cauchy'} | default = 'hommel') – A method to combine p-values obtained from cross-validation. see (https://arxiv.org/pdf/1212.4966.pdf) for more detail.
  • verbose ({0,1} | default=1) – If print the adaptive splitting process.
Returns:

  • n_opt (integer) – A reasonable estimation sample size.
  • m_opt (integer) – A reasonable inference sample size.
  • perturb_opt (float) – A reasonable perturbation level.

mask_cov(X, k=0)[source]

Return instances with masked k-th hypothesized features.

Parameters:
  • X (array-like) – Target instances.
  • k (integer, default = 0) – k-th hypothesized features in inf_feats
perm_cov(X, k=0)[source]

Return instances with permuted k-th hypothesized features.

Parameters:
  • X (array-like) – Target instances.
  • k (integer, default = 0) – k-th hypothesized features in inf_feats
reset_model()[source]

Reset the full and mask network models under class Dnn

save_init()[source]

Save the initialization for full and mask network models under class Dnn

testing(X, y, fit_params, split_params={}, cv_num=5, cp='hommel', inf_ratio=None)[source]

Return p-values for hypothesis testing for inf_feats in class Dnn.

Parameters:
  • X ({array-like} of shape (n_samples, dim_features)**) –
    Instances matrix/tensor, where n_samples in the number of samples and dim_features is the dimension of the features.
    If X is vectorized feature, shape should be (#Samples, dim of feaures) If X is image/matrix data, shape should be (#samples, img_rows, img_cols, channel), that is, X must channel_last image data. - y: {array-like} of shape (n_samples,) Output vector/matrix relative to X.
  • fit_params ({dict of fitting parameters}**) – See keras fit: (https://keras.rstudio.com/reference/fit.html), including batch_size, epoch, callbacks, validation_split, validation_data, and so on.
  • split_params ({dict of splitting parameters}) –
    split: {‘one-split’, ‘two-split’}, default=’one-split’
    one-split or two-split test statistic.
    perturb: float, default=None
    Perturb level for the one-split test, if perturb = None, then the perturb level is determined by adaptive tunning.
    num_perm: int, default=100
    Number of permutation for determine the splitting ratio.
    ratio_grid: list of float (0,1), default=[.2, .4, .6, .8]**
    A list of estimation/inference ratios under searching.
    if_reverse: {0,1}, default=0
    if_reverse = 0 indicates the loop of ratio_grid starts from smallest one to largest one; if_reverse = 1 indicates the loop of ratio_grid starts from largest one to smallest one.
    perturb_grid: list of float, default=[.01, .05, .1, .5, 1.]**
    A list of perturb levels under searching.
    min_inf: int, default=0
    The minimal size for inference sample.
    min_est: int, default=0
    The minimal size for estimation sample.
    ratio_method: {‘fuse’, ‘close’}, default=’fuse’
    The adaptive splitting method to determine the optimal estimation/inference ratios.
    cv_num: int, default=1
    The number of cross-validation to shuffle the estimation/inference samples in adaptive ratio splitting.
    cp: {‘gmean’, ‘min’, ‘hmean’, ‘Q1’, ‘hommel’, ‘cauchy’}, default =’hommel’**
    A method to combine p-values obtained from cross-validation. see (https://arxiv.org/pdf/1212.4966.pdf) for more detail.

    verbose: {0,1}, default=1**

  • cv_num (int, default=5) – The number of cross-validation to shuffle the estimation/inference samples in testing.
  • cp ({'gmean', 'min', 'hmean', 'Q1', 'hommel', 'cauchy'}, default ='hommel'**) – A method to combine p-values obtained from cross-validation.
  • inf_ratio (float, default=None**) – A pre-specific inference sample ratio, if est_size=None, then it is determined by adaptive splitting method metric.
Returns:

P_value – The p_values for target hypothesis testings.

Return type:

array of float [0, 1]**

visual(X, y, plt_params={'alpha': 0.6, 'cmap': 'RdBu'}, plt_mask_params={'alpha': 0.6, 'cmap': 'RdBu'})[source]

Visualization for the inference results based on one illustrative example

Parameters:

API reference for PermT class

class dnn_inference.PermT(inf_feats, model, model_perm, alpha=0.05, num_folds=5, num_perm=100, verbose=0, eva_metric='mse')[source]