Imputation Models
Non-NN Based Imputer
fedimpute.execution_environment.imputation.base.base_imputer.BaseMLImputer
class BaseMLImputer(name: str, model_persistable: bool)
Abstract class for the non-NN based imputer to be used in the federated imputation environment
Methods
-
get_imp_model_params — Return model parameters
-
set_imp_model_params — Set model parameters
-
initialize — Initialize imputer - statistics imputation models etc.
-
fit — Fit imputer to train local imputation models
-
impute — Impute missing values using an imputation model
-
get_fit_res
-
save_model — Save the imputer model
-
load_model — Load the imputer model
fedimpute.execution_environment.imputation.base.base_imputer.BaseMLImputer.initialize
method BaseMLImputer.initialize(X: np.array, missing_mask: np.array, data_utils: dict, params: dict, seed: int) → None
Initialize imputer - statistics imputation models etc.
Parameters
-
X : np.array — data with intial imputed values
-
missing_mask : np.array — missing mask of data
-
data_utils : dict — data utils dictionary - contains information about data
-
params : dict — params for initialization
-
seed : int — int - seed for randomization
fedimpute.execution_environment.imputation.base.base_imputer.BaseMLImputer.fit
method BaseMLImputer.fit(X: np.array, y: np.array, missing_mask: np.array, params: dict) → dict
Fit imputer to train local imputation models
Parameters
-
X : np.array — np.array - float numpy array features
-
y : np.array — np.array - target
-
missing_mask : np.array — np.array - missing mask
-
params : dict — parameters for local training
fedimpute.execution_environment.imputation.base.base_imputer.BaseMLImputer.impute
method BaseMLImputer.impute(X: np.array, y: np.array, missing_mask: np.array, params: dict) → np.ndarray
Impute missing values using an imputation model
Parameters
-
X : np.array — numpy array of features
-
y : np.array — numpy array of target
-
missing_mask : np.array — missing mask
-
params : dict — parameters for imputation
Returns
-
np.ndarray — imputed data - numpy array - same dimension as X
Mean
fedimpute.execution_environment.imputation.imputers.simple_imputer.SimpleImputer
class SimpleImputer(strategy: str = 'mean')
Bases : BaseMLImputer
Simple imputer class for imputing missing values in data using simple strategies like mean, median etc.
Attributes
-
strategy : str — strategy for imputation - mean, median etc.
-
mean_params : np.array — mean parameters for imputation
-
model_type : str — type of the model - numpy or sklearn
-
model_persistable : bool — whether model is persistable or not
-
name : str — name of the imputer
Raises
-
ValueError
Methods
-
get_imp_model_params
-
set_imp_model_params
-
initialize
-
fit
-
impute
-
get_fit_res
-
save_model — Save the imputer model
-
load_model — Load the imputer model
EM
fedimpute.execution_environment.imputation.imputers.em_imputer.EMImputer
class EMImputer(clip: bool = True, use_y: bool = False)
Bases : BaseMLImputer, ICEImputerMixin
EM imputer class for imputing missing values in data using Expectation Maximization algorithm.
Attributes
-
clip — bool - whether to clip the imputed values
-
use_y — bool - whether to use target variable in imputation
-
min_values — np.array - minimum values for clipping
-
max_values — np.array - maximum values for clipping
-
data_utils_info — dict - information about data
-
seed — int - seed for randomization
-
name — str = 'em' - name of the imputer
-
model_type — str = 'simple' - type of the imputer - simple or nn - neural network based or not
-
mu — np.array - mean of the data
-
sigma — np.array - covariance matrix of the data
-
miss — np.array - missing values indices
-
obs — np.array - observed values indices
-
model_persistable — bool - whether model is persistable or not
Methods
-
get_imp_model_params
-
set_imp_model_params
-
initialize — Initialize imputer - statistics imputation models etc.
-
fit — Fit the imputer on the data.
-
impute — Impute the missing values in the data.
-
get_fit_res
-
save_model — Save the imputer model
-
load_model — Load the imputer model
-
get_clip_thresholds
-
set_clip_thresholds
-
get_visit_indices
-
_em — Perform the EM step for imputing missing values.
-
_converged — Checks if the EM loop has converged.
ICE
fedimpute.execution_environment.imputation.imputers.linear_ice_imputer.LinearICEImputer
class LinearICEImputer(estimator_num: str = 'ridge_cv', estimator_cat: str = 'logistic', mm_model: str = 'logistic', mm_model_params=None, clip: bool = True, use_y: bool = False)
Bases : BaseMLImputer, ICEImputerMixin
Linear ICE imputer class for imputing missing values in data using linear models.
Attributes
-
estimator_num : str — estimator for numerical columns
-
estimator_cat : str — estimator for categorical columns
-
mm_model — missing mechanism model
-
mm_model_params : dict — missing mechanism model parameters
-
clip : bool — whether to clip the imputed values
-
use_y : bool — whether to use target variable in imputation
-
imp_models : list — list of imputation models
-
data_utils_info : dict — information about data
-
seed : int — seed for randomization
-
model_type : str — type of the imputer - simple or nn - neural network based or not, defaults to 'sklearn'
-
model_persistable : bool — whether model is persistable or not, defaults to False
-
name : str — name of the imputer, defaults to 'linear_ice'
Methods
-
get_imp_model_params
-
set_imp_model_params
-
initialize — Initialize imputer - statistics imputation models etc.
-
fit — Fit imputer to train local imputation models
-
impute — Impute missing values using an imputation model
-
get_fit_res
-
save_model
-
load_model
-
get_clip_thresholds
-
set_clip_thresholds
-
get_visit_indices
MissForest
fedimpute.execution_environment.imputation.imputers.missforest_imputer.MissForestImputer
class MissForestImputer(n_estimators: int = 200, bootstrap: bool = True, n_jobs: int = 2, clip: bool = True, use_y: bool = False)
Bases : BaseMLImputer, ICEImputerMixin
MissForest imputer class for the federated imputation environment
Attributes
-
n_estimators : int — number of trees in the forest
-
bootstrap : bool — whether bootstrap samples are used when building trees
-
n_jobs : int — number of jobs to run in parallel
-
clip : bool — whether to clip the imputed values
-
use_y : bool — whether to use target values for imputation
-
imp_models : list — list of imputation models
-
mm_model : object — model for missing mask imputation
-
data_utils_info : dict — data utils information
-
seed : int — seed for randomization
-
model_type : str — type of the model, defaults to 'sklearn'
-
model_persistable : bool — whether the model is persistable, defaults to False
-
name : str — name of the imputer, defaults to 'missforest'
Methods
-
get_imp_model_params
-
set_imp_model_params
-
initialize
-
fit
-
impute
-
get_fit_res
-
save_model
-
load_model
-
get_clip_thresholds
-
set_clip_thresholds
-
get_visit_indices
NN Based Imputer
fedimpute.execution_environment.imputation.base.base_imputer.BaseNNImputer
class BaseNNImputer()
Abstract class for the NN based imputer to be used in the federated imputation environment
Methods
-
get_imp_model_params — Return model parameters
-
set_imp_model_params — Set model parameters
-
initialize — Initialize imputer - statistics imputation models etc.
-
configure_model — Fetch model for training
-
configure_optimizer — Configure optimizer for training
-
impute — Impute missing values using an imputation model
-
save_model — Save the imputer model
-
load_model — Load the imputer model
fedimpute.execution_environment.imputation.base.base_imputer.BaseNNImputer.initialize
method BaseNNImputer.initialize(X: np.array, missing_mask: np.array, data_utils: dict, params: dict, seed: int) → None
Initialize imputer - statistics imputation models etc.
Parameters
-
X : np.array — data with intial imputed values
-
missing_mask : np.array — missing mask of data
-
data_utils : dict — data utils dictionary - contains information about data
-
params : dict — params for initialization
-
seed : int — seed for randomization
fedimpute.execution_environment.imputation.base.base_imputer.BaseNNImputer.configure_model
method BaseNNImputer.configure_model(params: dict, X: np.ndarray, y: np.ndarray, missing_mask: np.ndarray) → Tuple[torch.nn.Module, torch.utils.data.DataLoader]
Fetch model for training
Parameters
-
params : dict — parameters for training
-
X : np.ndarray — imputed data
-
y : np.ndarray — target
-
missing_mask : np.ndarray — missing mask
Returns
-
Tuple[torch.nn.Module, torch.utils.data.DataLoader] — model, train_dataloader
fedimpute.execution_environment.imputation.base.base_imputer.BaseNNImputer.impute
method BaseNNImputer.impute(X: np.array, y: np.array, missing_mask: np.array, params: dict) → np.ndarray
Impute missing values using an imputation model
Parameters
-
X : np.array — numpy array of features
-
y : np.array — numpy array of target
-
missing_mask : np.array — missing mask
-
params : dict — parameters for imputation
Returns
-
np.ndarray — imputed data - numpy array - same dimension as X
GAIN
fedimpute.execution_environment.imputation.imputers.gain_imputer.GAINImputer
class GAINImputer(h_dim: int = 20, n_layers: int = 2, activation: str = 'relu', initializer: str = 'kaiming', loss_alpha: float = 10, hint_rate: float = 0.9, clip: bool = True, batch_size: int = 256, learning_rate: int = 0.001, weight_decay: int = 0.0001, scheduler: str = 'step', optimizer: str = 'sgd')
Bases : BaseNNImputer, JMImputerMixin
GAIN imputer class for imputing missing values in data using Generative Adversarial Imputation Networks.
Attributes
-
h_dim : int — dimension of hidden layers
-
n_layers : int — number of layers
-
activation : str — activation function
-
initializer : str — initializer for weights
-
loss_alpha : float — alpha parameter for loss
-
hint_rate : float — hint rate for loss
-
clip : bool — whether to clip the imputed values
-
batch_size : int — batch size for training
-
learning_rate : int — learning rate for optimizer
-
weight_decay : int — weight decay for optimizer
-
scheduler : str — scheduler for optimizer
-
optimizer : str — optimizer for training
-
scheduler_params : dict — scheduler parameters
Methods
-
get_imp_model_params
-
set_imp_model_params
-
initialize
-
configure_model
-
configure_optimizer
-
impute
-
save_model — Save the imputer model
-
load_model — Load the imputer model
-
get_clip_thresholds
-
set_clip_thresholds
MIWAE
fedimpute.execution_environment.imputation.imputers.miwae_imputer.MIWAEImputer
class MIWAEImputer(name: str = 'miwae', latent_size: int = 5, n_hidden: int = 16, n_hidden_layers: int = 2, out_dist='studentt', K: int = 20, L: int = 100, activation='tanh', initializer='xavier', clip: bool = True, batch_size: int = 256, learning_rate: int = 0.001, weight_decay: int = 0.0001, scheduler: str = 'step', optimizer: str = 'sgd')
Bases : BaseNNImputer, JMImputerMixin
MiWAE imputer class for imputing missing values in data using Multiple Imputation with Auxiliary Deep Generative Models.
Attributes
-
name : str — name of the imputer
-
clip : bool — whether to clip the imputed values
-
latent_size : int — size of the latent space
-
n_hidden : int — number of hidden units
-
n_hidden_layers : int — number of hidden layers
-
out_dist : str — output distribution
-
K : int — number of samples
-
L : int — number of MCMC samples
-
activation : str — activation function
-
initializer : str — initializer for weights
-
batch_size : int — batch size for training
-
learning_rate : int — learning rate for optimizer
-
weight_decay : int — weight decay for optimizer
-
scheduler : str — scheduler for optimizer
-
optimizer : str — optimizer for training
Methods
-
get_imp_model_params
-
set_imp_model_params
-
initialize
-
configure_model
-
configure_optimizer
-
impute
-
save_model — Save the imputer model
-
load_model — Load the imputer model
-
get_clip_thresholds
-
set_clip_thresholds
-
fit
NOTMIWAE
fedimpute.execution_environment.imputation.imputers.notmiwae_imputer.NotMIWAEImputer
class NotMIWAEImputer(latent_size: int = 5, n_hidden: int = 16, n_hidden_layers: int = 2, out_dist='studentt', K: int = 20, L: int = 100, activation='tanh', initializer='xavier', mask_net_type: str = 'linear', clip: bool = True, batch_size: int = 256, learning_rate: int = 0.001, weight_decay: int = 0.0001, scheduler: str = 'step', optimizer: str = 'sgd')
Bases : BaseNNImputer, JMImputerMixin
MiWAE imputer class for imputing missing values in data using Multiple Imputation with Auxiliary Deep Generative Models.
Attributes
-
name : str — name of the imputer
-
clip : bool — whether to clip the imputed values
-
latent_size : int — size of the latent space
-
n_hidden : int — number of hidden units
-
n_hidden_layers : int — number of hidden layers
-
out_dist : str — output distribution
-
K : int — number of samples
-
L : int — number of MCMC samples
-
activation : str — activation function
-
initializer : str — initializer for weights
-
batch_size : int — batch size for training
-
learning_rate : int — learning rate for optimizer
-
weight_decay : int — weight decay for optimizer
-
scheduler : str — scheduler for optimizer
-
optimizer : str — optimizer for training
Methods
-
get_imp_model_params
-
set_imp_model_params
-
initialize
-
configure_model
-
configure_optimizer
-
impute
-
save_model — Save the imputer model
-
load_model — Load the imputer model
-
get_clip_thresholds
-
set_clip_thresholds
-
fit
GNR
fedimpute.execution_environment.imputation.imputers.gnr_imputer.GNRImputer
class GNRImputer(latent_size: int = 5, n_hidden: int = 16, n_hidden_layers: int = 2, K: int = 20, L: int = 100, activation='tanh', initializer='xavier', loss_coef=10, mr_loss_coef: bool = True, clip: bool = True, batch_size: int = 256, learning_rate: int = 0.001, weight_decay: int = 0.0001, scheduler: str = 'step', optimizer: str = 'sgd')
Bases : BaseNNImputer, JMImputerMixin
MiWAE imputer class for imputing missing values in data using Multiple Imputation with Auxiliary Deep Generative Models.
Attributes
-
name : str — name of the imputer
-
clip : bool — whether to clip the imputed values
-
latent_size : int — size of the latent space
-
n_hidden : int — number of hidden units
-
n_hidden_layers : int — number of hidden layers
-
out_dist : str — output distribution
-
K : int — number of samples
-
L : int — number of MCMC samples
-
activation : str — activation function
-
initializer : str — initializer for weights
-
batch_size : int — batch size for training
-
learning_rate : int — learning rate for optimizer
-
weight_decay : int — weight decay for optimizer
-
scheduler : str — scheduler for optimizer
-
optimizer : str — optimizer for training
Methods
-
get_imp_model_params
-
set_imp_model_params
-
initialize
-
configure_model
-
configure_optimizer
-
impute
-
save_model — Save the imputer model
-
load_model — Load the imputer model
-
get_clip_thresholds
-
set_clip_thresholds
-
fit