Imputation Models

Non-NN Based Imputer

fedimpute.execution_environment.imputation.base.base_imputer.BaseMLImputer

class BaseMLImputer(name: str, model_persistable: bool)

Abstract class for the non-NN based imputer to be used in the federated imputation environment

Methods

  • get_imp_model_params — Return model parameters

  • set_imp_model_params — Set model parameters

  • initialize — Initialize imputer - statistics imputation models etc.

  • fit — Fit imputer to train local imputation models

  • impute — Impute missing values using an imputation model

  • get_fit_res

  • save_model — Save the imputer model

  • load_model — Load the imputer model

fedimpute.execution_environment.imputation.base.base_imputer.BaseMLImputer.initialize

method BaseMLImputer.initialize(X: np.array, missing_mask: np.array, data_utils: dict, params: dict, seed: int)None

Initialize imputer - statistics imputation models etc.

Parameters

  • X : np.array — data with intial imputed values

  • missing_mask : np.array — missing mask of data

  • data_utils : dict — data utils dictionary - contains information about data

  • params : dict — params for initialization

  • seed : int — int - seed for randomization

fedimpute.execution_environment.imputation.base.base_imputer.BaseMLImputer.fit

method BaseMLImputer.fit(X: np.array, y: np.array, missing_mask: np.array, params: dict)dict

Fit imputer to train local imputation models

Parameters

  • X : np.array — np.array - float numpy array features

  • y : np.array — np.array - target

  • missing_mask : np.array — np.array - missing mask

  • params : dict — parameters for local training

fedimpute.execution_environment.imputation.base.base_imputer.BaseMLImputer.impute

method BaseMLImputer.impute(X: np.array, y: np.array, missing_mask: np.array, params: dict)np.ndarray

Impute missing values using an imputation model

Parameters

  • X : np.array — numpy array of features

  • y : np.array — numpy array of target

  • missing_mask : np.array — missing mask

  • params : dict — parameters for imputation

Returns

  • np.ndarray — imputed data - numpy array - same dimension as X

Mean

fedimpute.execution_environment.imputation.imputers.simple_imputer.SimpleImputer

class SimpleImputer(strategy: str = 'mean')

Bases : BaseMLImputer

Simple imputer class for imputing missing values in data using simple strategies like mean, median etc.

Attributes

  • strategy : str — strategy for imputation - mean, median etc.

  • mean_params : np.array — mean parameters for imputation

  • model_type : str — type of the model - numpy or sklearn

  • model_persistable : bool — whether model is persistable or not

  • name : str — name of the imputer

Raises

  • ValueError

Methods

  • get_imp_model_params

  • set_imp_model_params

  • initialize

  • fit

  • impute

  • get_fit_res

  • save_model — Save the imputer model

  • load_model — Load the imputer model

EM

fedimpute.execution_environment.imputation.imputers.em_imputer.EMImputer

class EMImputer(clip: bool = True, use_y: bool = False)

Bases : BaseMLImputer, ICEImputerMixin

EM imputer class for imputing missing values in data using Expectation Maximization algorithm.

Attributes

  • clip — bool - whether to clip the imputed values

  • use_y — bool - whether to use target variable in imputation

  • min_values — np.array - minimum values for clipping

  • max_values — np.array - maximum values for clipping

  • data_utils_info — dict - information about data

  • seed — int - seed for randomization

  • name — str = 'em' - name of the imputer

  • model_type — str = 'simple' - type of the imputer - simple or nn - neural network based or not

  • mu — np.array - mean of the data

  • sigma — np.array - covariance matrix of the data

  • miss — np.array - missing values indices

  • obs — np.array - observed values indices

  • model_persistable — bool - whether model is persistable or not

Methods

  • get_imp_model_params

  • set_imp_model_params

  • initialize — Initialize imputer - statistics imputation models etc.

  • fit — Fit the imputer on the data.

  • impute — Impute the missing values in the data.

  • get_fit_res

  • save_model — Save the imputer model

  • load_model — Load the imputer model

  • get_clip_thresholds

  • set_clip_thresholds

  • get_visit_indices

  • _em — Perform the EM step for imputing missing values.

  • _converged — Checks if the EM loop has converged.

ICE

fedimpute.execution_environment.imputation.imputers.linear_ice_imputer.LinearICEImputer

class LinearICEImputer(estimator_num: str = 'ridge_cv', estimator_cat: str = 'logistic', mm_model: str = 'logistic', mm_model_params=None, clip: bool = True, use_y: bool = False)

Bases : BaseMLImputer, ICEImputerMixin

Linear ICE imputer class for imputing missing values in data using linear models.

Attributes

  • estimator_num : str — estimator for numerical columns

  • estimator_cat : str — estimator for categorical columns

  • mm_model — missing mechanism model

  • mm_model_params : dict — missing mechanism model parameters

  • clip : bool — whether to clip the imputed values

  • use_y : bool — whether to use target variable in imputation

  • imp_models : list — list of imputation models

  • data_utils_info : dict — information about data

  • seed : int — seed for randomization

  • model_type : str — type of the imputer - simple or nn - neural network based or not, defaults to 'sklearn'

  • model_persistable : bool — whether model is persistable or not, defaults to False

  • name : str — name of the imputer, defaults to 'linear_ice'

Methods

  • get_imp_model_params

  • set_imp_model_params

  • initialize — Initialize imputer - statistics imputation models etc.

  • fit — Fit imputer to train local imputation models

  • impute — Impute missing values using an imputation model

  • get_fit_res

  • save_model

  • load_model

  • get_clip_thresholds

  • set_clip_thresholds

  • get_visit_indices

MissForest

fedimpute.execution_environment.imputation.imputers.missforest_imputer.MissForestImputer

class MissForestImputer(n_estimators: int = 200, bootstrap: bool = True, n_jobs: int = 2, clip: bool = True, use_y: bool = False)

Bases : BaseMLImputer, ICEImputerMixin

MissForest imputer class for the federated imputation environment

Attributes

  • n_estimators : int — number of trees in the forest

  • bootstrap : bool — whether bootstrap samples are used when building trees

  • n_jobs : int — number of jobs to run in parallel

  • clip : bool — whether to clip the imputed values

  • use_y : bool — whether to use target values for imputation

  • imp_models : list — list of imputation models

  • mm_model : object — model for missing mask imputation

  • data_utils_info : dict — data utils information

  • seed : int — seed for randomization

  • model_type : str — type of the model, defaults to 'sklearn'

  • model_persistable : bool — whether the model is persistable, defaults to False

  • name : str — name of the imputer, defaults to 'missforest'

Methods

  • get_imp_model_params

  • set_imp_model_params

  • initialize

  • fit

  • impute

  • get_fit_res

  • save_model

  • load_model

  • get_clip_thresholds

  • set_clip_thresholds

  • get_visit_indices

NN Based Imputer

fedimpute.execution_environment.imputation.base.base_imputer.BaseNNImputer

class BaseNNImputer()

Abstract class for the NN based imputer to be used in the federated imputation environment

Methods

  • get_imp_model_params — Return model parameters

  • set_imp_model_params — Set model parameters

  • initialize — Initialize imputer - statistics imputation models etc.

  • configure_model — Fetch model for training

  • configure_optimizer — Configure optimizer for training

  • impute — Impute missing values using an imputation model

  • save_model — Save the imputer model

  • load_model — Load the imputer model

fedimpute.execution_environment.imputation.base.base_imputer.BaseNNImputer.initialize

method BaseNNImputer.initialize(X: np.array, missing_mask: np.array, data_utils: dict, params: dict, seed: int)None

Initialize imputer - statistics imputation models etc.

Parameters

  • X : np.array — data with intial imputed values

  • missing_mask : np.array — missing mask of data

  • data_utils : dict — data utils dictionary - contains information about data

  • params : dict — params for initialization

  • seed : int — seed for randomization

fedimpute.execution_environment.imputation.base.base_imputer.BaseNNImputer.configure_model

method BaseNNImputer.configure_model(params: dict, X: np.ndarray, y: np.ndarray, missing_mask: np.ndarray)Tuple[torch.nn.Module, torch.utils.data.DataLoader]

Fetch model for training

Parameters

  • params : dict — parameters for training

  • X : np.ndarray — imputed data

  • y : np.ndarray — target

  • missing_mask : np.ndarray — missing mask

Returns

  • Tuple[torch.nn.Module, torch.utils.data.DataLoader] — model, train_dataloader

fedimpute.execution_environment.imputation.base.base_imputer.BaseNNImputer.impute

method BaseNNImputer.impute(X: np.array, y: np.array, missing_mask: np.array, params: dict)np.ndarray

Impute missing values using an imputation model

Parameters

  • X : np.array — numpy array of features

  • y : np.array — numpy array of target

  • missing_mask : np.array — missing mask

  • params : dict — parameters for imputation

Returns

  • np.ndarray — imputed data - numpy array - same dimension as X

GAIN

fedimpute.execution_environment.imputation.imputers.gain_imputer.GAINImputer

class GAINImputer(h_dim: int = 20, n_layers: int = 2, activation: str = 'relu', initializer: str = 'kaiming', loss_alpha: float = 10, hint_rate: float = 0.9, clip: bool = True, batch_size: int = 256, learning_rate: int = 0.001, weight_decay: int = 0.0001, scheduler: str = 'step', optimizer: str = 'sgd')

Bases : BaseNNImputer, JMImputerMixin

GAIN imputer class for imputing missing values in data using Generative Adversarial Imputation Networks.

Attributes

  • h_dim : int — dimension of hidden layers

  • n_layers : int — number of layers

  • activation : str — activation function

  • initializer : str — initializer for weights

  • loss_alpha : float — alpha parameter for loss

  • hint_rate : float — hint rate for loss

  • clip : bool — whether to clip the imputed values

  • batch_size : int — batch size for training

  • learning_rate : int — learning rate for optimizer

  • weight_decay : int — weight decay for optimizer

  • scheduler : str — scheduler for optimizer

  • optimizer : str — optimizer for training

  • scheduler_params : dict — scheduler parameters

Methods

  • get_imp_model_params

  • set_imp_model_params

  • initialize

  • configure_model

  • configure_optimizer

  • impute

  • save_model — Save the imputer model

  • load_model — Load the imputer model

  • get_clip_thresholds

  • set_clip_thresholds

MIWAE

fedimpute.execution_environment.imputation.imputers.miwae_imputer.MIWAEImputer

class MIWAEImputer(name: str = 'miwae', latent_size: int = 5, n_hidden: int = 16, n_hidden_layers: int = 2, out_dist='studentt', K: int = 20, L: int = 100, activation='tanh', initializer='xavier', clip: bool = True, batch_size: int = 256, learning_rate: int = 0.001, weight_decay: int = 0.0001, scheduler: str = 'step', optimizer: str = 'sgd')

Bases : BaseNNImputer, JMImputerMixin

MiWAE imputer class for imputing missing values in data using Multiple Imputation with Auxiliary Deep Generative Models.

Attributes

  • name : str — name of the imputer

  • clip : bool — whether to clip the imputed values

  • latent_size : int — size of the latent space

  • n_hidden : int — number of hidden units

  • n_hidden_layers : int — number of hidden layers

  • out_dist : str — output distribution

  • K : int — number of samples

  • L : int — number of MCMC samples

  • activation : str — activation function

  • initializer : str — initializer for weights

  • batch_size : int — batch size for training

  • learning_rate : int — learning rate for optimizer

  • weight_decay : int — weight decay for optimizer

  • scheduler : str — scheduler for optimizer

  • optimizer : str — optimizer for training

Methods

  • get_imp_model_params

  • set_imp_model_params

  • initialize

  • configure_model

  • configure_optimizer

  • impute

  • save_model — Save the imputer model

  • load_model — Load the imputer model

  • get_clip_thresholds

  • set_clip_thresholds

  • fit

NOTMIWAE

fedimpute.execution_environment.imputation.imputers.notmiwae_imputer.NotMIWAEImputer

class NotMIWAEImputer(latent_size: int = 5, n_hidden: int = 16, n_hidden_layers: int = 2, out_dist='studentt', K: int = 20, L: int = 100, activation='tanh', initializer='xavier', mask_net_type: str = 'linear', clip: bool = True, batch_size: int = 256, learning_rate: int = 0.001, weight_decay: int = 0.0001, scheduler: str = 'step', optimizer: str = 'sgd')

Bases : BaseNNImputer, JMImputerMixin

MiWAE imputer class for imputing missing values in data using Multiple Imputation with Auxiliary Deep Generative Models.

Attributes

  • name : str — name of the imputer

  • clip : bool — whether to clip the imputed values

  • latent_size : int — size of the latent space

  • n_hidden : int — number of hidden units

  • n_hidden_layers : int — number of hidden layers

  • out_dist : str — output distribution

  • K : int — number of samples

  • L : int — number of MCMC samples

  • activation : str — activation function

  • initializer : str — initializer for weights

  • batch_size : int — batch size for training

  • learning_rate : int — learning rate for optimizer

  • weight_decay : int — weight decay for optimizer

  • scheduler : str — scheduler for optimizer

  • optimizer : str — optimizer for training

Methods

  • get_imp_model_params

  • set_imp_model_params

  • initialize

  • configure_model

  • configure_optimizer

  • impute

  • save_model — Save the imputer model

  • load_model — Load the imputer model

  • get_clip_thresholds

  • set_clip_thresholds

  • fit

GNR

fedimpute.execution_environment.imputation.imputers.gnr_imputer.GNRImputer

class GNRImputer(latent_size: int = 5, n_hidden: int = 16, n_hidden_layers: int = 2, K: int = 20, L: int = 100, activation='tanh', initializer='xavier', loss_coef=10, mr_loss_coef: bool = True, clip: bool = True, batch_size: int = 256, learning_rate: int = 0.001, weight_decay: int = 0.0001, scheduler: str = 'step', optimizer: str = 'sgd')

Bases : BaseNNImputer, JMImputerMixin

MiWAE imputer class for imputing missing values in data using Multiple Imputation with Auxiliary Deep Generative Models.

Attributes

  • name : str — name of the imputer

  • clip : bool — whether to clip the imputed values

  • latent_size : int — size of the latent space

  • n_hidden : int — number of hidden units

  • n_hidden_layers : int — number of hidden layers

  • out_dist : str — output distribution

  • K : int — number of samples

  • L : int — number of MCMC samples

  • activation : str — activation function

  • initializer : str — initializer for weights

  • batch_size : int — batch size for training

  • learning_rate : int — learning rate for optimizer

  • weight_decay : int — weight decay for optimizer

  • scheduler : str — scheduler for optimizer

  • optimizer : str — optimizer for training

Methods

  • get_imp_model_params

  • set_imp_model_params

  • initialize

  • configure_model

  • configure_optimizer

  • impute

  • save_model — Save the imputer model

  • load_model — Load the imputer model

  • get_clip_thresholds

  • set_clip_thresholds

  • fit