Imputation Models

Non-NN Based Imputer

class BaseMLImputer(name: str, model_persistable: bool)

Abstract class for the non-NN based imputer to be used in the federated imputation environment

Methods

get_imp_model_params — Return model parameters
set_imp_model_params — Set model parameters
initialize — Initialize imputer - statistics imputation models etc.
fit — Fit imputer to train local imputation models
impute — Impute missing values using an imputation model
get_fit_res
save_model — Save the imputer model
load_model — Load the imputer model

method BaseMLImputer.initialize(X: np.array, missing_mask: np.array, data_utils: dict, params: dict, seed: int) → None

Initialize imputer - statistics imputation models etc.

Parameters

X : np.array — data with intial imputed values
missing_mask : np.array — missing mask of data
data_utils : dict — data utils dictionary - contains information about data
params : dict — params for initialization
seed : int — int - seed for randomization

method BaseMLImputer.fit(X: np.array, y: np.array, missing_mask: np.array, params: dict) → dict

Fit imputer to train local imputation models

Parameters

X : np.array — np.array - float numpy array features
y : np.array — np.array - target
missing_mask : np.array — np.array - missing mask
params : dict — parameters for local training

method BaseMLImputer.impute(X: np.array, y: np.array, missing_mask: np.array, params: dict) → np.ndarray

Impute missing values using an imputation model

Parameters

X : np.array — numpy array of features
y : np.array — numpy array of target
missing_mask : np.array — missing mask
params : dict — parameters for imputation

Returns

np.ndarray — imputed data - numpy array - same dimension as X

Mean

class SimpleImputer(strategy: str = 'mean')

Bases : BaseMLImputer

Simple imputer class for imputing missing values in data using simple strategies like mean, median etc.

Attributes

strategy : str — strategy for imputation - mean, median etc.
mean_params : np.array — mean parameters for imputation
model_type : str — type of the model - numpy or sklearn
model_persistable : bool — whether model is persistable or not
name : str — name of the imputer

Methods

get_imp_model_params
set_imp_model_params
initialize
fit
impute
get_fit_res

EM

class EMImputer(clip: bool = True, use_y: bool = False)

Bases : BaseMLImputer, ICEImputerMixin

EM imputer class for imputing missing values in data using Expectation Maximization algorithm.

Attributes

clip — bool - whether to clip the imputed values
use_y — bool - whether to use target variable in imputation
min_values — np.array - minimum values for clipping
max_values — np.array - maximum values for clipping
data_utils_info — dict - information about data
seed — int - seed for randomization
name — str = 'em' - name of the imputer
model_type — str = 'simple' - type of the imputer - simple or nn - neural network based or not
mu — np.array - mean of the data
sigma — np.array - covariance matrix of the data
miss — np.array - missing values indices
obs — np.array - observed values indices
model_persistable — bool - whether model is persistable or not

Methods

initialize — Initialize imputer - statistics imputation models etc.
set_imp_model_params
get_imp_model_params
fit — Fit the imputer on the data.
impute — Impute the missing values in the data.
get_fit_res

ICE

class LinearICEImputer(estimator_num: str = 'ridge_cv', estimator_cat: str = 'ridge', mm_model: str = 'logistic', mm_model_params=None, clip: bool = True, use_y: bool = False)

Bases : BaseMLImputer, ICEImputerMixin

Linear ICE imputer class for imputing missing values in data using linear models.

Attributes

estimator_num : str — estimator for numerical columns
estimator_cat : str — estimator for categorical columns
mm_model : str — missing mechanism model
mm_model_params : dict — missing mechanism model parameters
clip : bool — whether to clip the imputed values
use_y : bool — whether to use target variable in imputation
imp_models : list — list of imputation models
mm_model — missing mechanism model
data_utils_info : dict — information about data
seed : int — seed for randomization
model_type : str — type of the imputer - simple or nn - neural network based or not, defaults to 'sklearn'
model_persistable : bool — whether model is persistable or not, defaults to False
name : str — name of the imputer, defaults to 'linear_ice'

Methods

initialize — Initialize imputer - statistics imputation models etc.
set_imp_model_params
get_imp_model_params
fit — Fit imputer to train local imputation models
impute — Impute missing values using an imputation model
save_model
load_model
get_fit_res

MissForest

class MissForestImputer(n_estimators: int = 200, bootstrap: bool = True, n_jobs: int = 2, clip: bool = True, use_y: bool = False)

Bases : BaseMLImputer, ICEImputerMixin

MissForest imputer class for the federated imputation environment

Attributes

n_estimators : int — number of trees in the forest
bootstrap : bool — whether bootstrap samples are used when building trees
n_jobs : int — number of jobs to run in parallel
clip : bool — whether to clip the imputed values
use_y : bool — whether to use target values for imputation
imp_models : list — list of imputation models
mm_model : object — model for missing mask imputation
data_utils_info : dict — data utils information
seed : int — seed for randomization
model_type : str — type of the model, defaults to 'sklearn'
model_persistable : bool — whether the model is persistable, defaults to False
name : str — name of the imputer, defaults to 'missforest'

Methods

initialize
set_imp_model_params
get_imp_model_params
fit
impute
save_model
load_model
get_fit_res

NN Based Imputer

class BaseNNImputer()

Abstract class for the NN based imputer to be used in the federated imputation environment

Methods

get_imp_model_params — Return model parameters
set_imp_model_params — Set model parameters
initialize — Initialize imputer - statistics imputation models etc.
configure_model — Fetch model for training
configure_optimizer — Configure optimizer for training
impute — Impute missing values using an imputation model
save_model — Save the imputer model
load_model — Load the imputer model

method BaseNNImputer.initialize(X: np.array, missing_mask: np.array, data_utils: dict, params: dict, seed: int) → None

Initialize imputer - statistics imputation models etc.

Parameters

X : np.array — data with intial imputed values
missing_mask : np.array — missing mask of data
data_utils : dict — data utils dictionary - contains information about data
params : dict — params for initialization
seed : int — seed for randomization

method BaseNNImputer.configure_model(params: dict, X: np.ndarray, y: np.ndarray, missing_mask: np.ndarray) → Tuple[torch.nn.Module, torch.utils.data.DataLoader]

Fetch model for training

Parameters

params : dict — parameters for training
X : np.ndarray — imputed data
y : np.ndarray — target
missing_mask : np.ndarray — missing mask

Returns

Tuple[torch.nn.Module, torch.utils.data.DataLoader] — model, train_dataloader

method BaseNNImputer.impute(X: np.array, y: np.array, missing_mask: np.array, params: dict) → np.ndarray

Impute missing values using an imputation model

Parameters

X : np.array — numpy array of features
y : np.array — numpy array of target
missing_mask : np.array — missing mask
params : dict — parameters for imputation

Returns

np.ndarray — imputed data - numpy array - same dimension as X

GAIN

class GAINImputer(h_dim: int = 20, n_layers: int = 2, activation: str = 'relu', initializer: str = 'kaiming', loss_alpha: float = 10, hint_rate: float = 0.9, clip: bool = True, batch_size: int = 256, learning_rate: int = 0.001, weight_decay: int = 0.0001, scheduler: str = 'step', optimizer: str = 'sgd')

Bases : BaseNNImputer, JMImputerMixin

GAIN imputer class for imputing missing values in data using Generative Adversarial Imputation Networks.

Attributes

h_dim : int — dimension of hidden layers
n_layers : int — number of layers
activation : str — activation function
initializer : str — initializer for weights
loss_alpha : float — alpha parameter for loss
hint_rate : float — hint rate for loss
clip : bool — whether to clip the imputed values
batch_size : int — batch size for training
learning_rate : int — learning rate for optimizer
weight_decay : int — weight decay for optimizer
scheduler : str — scheduler for optimizer
optimizer : str — optimizer for training
scheduler_params : dict — scheduler parameters

Methods

initialize
get_imp_model_params
set_imp_model_params
configure_model
configure_optimizer
impute

MIWAE

class MIWAEImputer(name: str = 'miwae', latent_size: int = 5, n_hidden: int = 16, n_hidden_layers: int = 2, out_dist='studentt', K: int = 20, L: int = 100, activation='tanh', initializer='xavier', clip: bool = True, batch_size: int = 256, learning_rate: int = 0.001, weight_decay: int = 0.0001, scheduler: str = 'step', optimizer: str = 'sgd')

Bases : BaseNNImputer, JMImputerMixin

MiWAE imputer class for imputing missing values in data using Multiple Imputation with Auxiliary Deep Generative Models.

Attributes

name : str — name of the imputer
clip : bool — whether to clip the imputed values
latent_size : int — size of the latent space
n_hidden : int — number of hidden units
n_hidden_layers : int — number of hidden layers
out_dist : str — output distribution
K : int — number of samples
L : int — number of MCMC samples
activation : str — activation function
initializer : str — initializer for weights
batch_size : int — batch size for training
learning_rate : int — learning rate for optimizer
weight_decay : int — weight decay for optimizer
scheduler : str — scheduler for optimizer
optimizer : str — optimizer for training

Methods

get_imp_model_params
set_imp_model_params
initialize
configure_model
configure_optimizer
fit
impute

NOTMIWAE

class NotMIWAEImputer(latent_size: int = 5, n_hidden: int = 16, n_hidden_layers: int = 2, out_dist='studentt', K: int = 20, L: int = 100, activation='tanh', initializer='xavier', mask_net_type: str = 'linear', clip: bool = True, batch_size: int = 256, learning_rate: int = 0.001, weight_decay: int = 0.0001, scheduler: str = 'step', optimizer: str = 'sgd')

Bases : BaseNNImputer, JMImputerMixin

MiWAE imputer class for imputing missing values in data using Multiple Imputation with Auxiliary Deep Generative Models.

Attributes

name : str — name of the imputer
clip : bool — whether to clip the imputed values
latent_size : int — size of the latent space
n_hidden : int — number of hidden units
n_hidden_layers : int — number of hidden layers
out_dist : str — output distribution
K : int — number of samples
L : int — number of MCMC samples
activation : str — activation function
initializer : str — initializer for weights
batch_size : int — batch size for training
learning_rate : int — learning rate for optimizer
weight_decay : int — weight decay for optimizer
scheduler : str — scheduler for optimizer
optimizer : str — optimizer for training

Methods

get_imp_model_params
set_imp_model_params
initialize
configure_model
configure_optimizer
fit
impute

GNR

class GNRImputer(latent_size: int = 5, n_hidden: int = 16, n_hidden_layers: int = 2, K: int = 20, L: int = 100, activation='tanh', initializer='xavier', loss_coef=10, mr_loss_coef: bool = True, clip: bool = True, batch_size: int = 256, learning_rate: int = 0.001, weight_decay: int = 0.0001, scheduler: str = 'step', optimizer: str = 'sgd')

Bases : BaseNNImputer, JMImputerMixin

MiWAE imputer class for imputing missing values in data using Multiple Imputation with Auxiliary Deep Generative Models.

Attributes

name : str — name of the imputer
clip : bool — whether to clip the imputed values
latent_size : int — size of the latent space
n_hidden : int — number of hidden units
n_hidden_layers : int — number of hidden layers
out_dist : str — output distribution
K : int — number of samples
L : int — number of MCMC samples
activation : str — activation function
initializer : str — initializer for weights
batch_size : int — batch size for training
learning_rate : int — learning rate for optimizer
weight_decay : int — weight decay for optimizer
scheduler : str — scheduler for optimizer
optimizer : str — optimizer for training

Methods

get_imp_model_params
set_imp_model_params
initialize
configure_model
configure_optimizer
fit
impute