Executing Distributed Imputation Algorithms

The FedImputeEnv class is the execution_environment module's main class. It is used to configure the federated imputation environment and execute federated imputation algorithms.

Overview and Basic Usage

Use needs to initialize the FedImputeEnv class and configure the environment using the configuration method - what imputer to use, what federated strategy to use, and what fitting mode to use. Then, use the setup_from_simulator method to set up the environment using the simulated data from simulator class, see Scenario Simulation Section. Finally, use the run_fed_imputation method to execute the federated imputation algorithms.

from fedimpute.execution_environment import FedImputeEnv

env = FedImputeEnv(debug_mode=False)
env.configuration(imputer = 'mice', fed_strategy='fedmice')
env.setup_from_scenario_builder(scenario_builder = scenario_builder, verbose=1)
env.show_env_info()
env.run_fed_imputation()

Note that if you use cuda version of torch, remember to set environment variable for cuda deterministic behavior first

# bash (linux)
export CUBLAS_WORKSPACE_CONFIG=:4096:8
# powershell (windows)
$Env:CUBLAS_WORKSPACE_CONFIG = ":4096:8"

Environment Configuration

The env.configuration() method is used to configure the environment. It takes the following arguments:

Options:

  • imputer (str) - name of imputation algorithm to use. Options: fed_mean, fed_em, fed_ice, fed_missforest, gain, miwae
  • fed_strategy (str) - name of federated strategy to use. Options: fedavg, fedprox, scaffold, fedavg_ft
  • fit_mode (str) - name of fitting mode to use - federated imputation, local-only imputation or centralized imputation. Options: fed, local, central
  • save_dir_path (str) - path to persist clients and server training process information (imputation models, imputed data etc.) for future use.

Other Params:

  • imputer_params (Union[None, dict]) = None - parameters for imputer
  • fed_strategy_params (Union[None, dict]) = None - parameters for federated strategy
  • workflow_params (Union[None, dict]) = None - parameters for workflow - Workflow class contains the logic for federated imputation workflow. It is associated with each Imputer class.
  • The built-in workflows are: ice - for ICE based imputation, em - for EM imputation, jm - for joint modeling based imputation such as VAE or GAN based imputation.

Supported Federated Imputation Algorithms

Federated Imputation Algorithms:

Method Type Fed Strategy Imputer (code) Workflow Reference
Mean Non-NN local, fedmean mean MEAN -
EM Non-NN local, fedem em EM EM, FedEM
MICE Non-NN local, fedmice mice ICE FedICE
MissForest Non-NN local, fedtree missforest ICE MissForest, Fed Randomforest
MIWAE NN local, fedavg, ... miwae JM MIWAE
GAIN NN local, fedavg, ... gain JM GAIN
Not-MIWAE NN local, fedavg, ... notmiwae JM Not-MIWAE
GNR NN local, fedavg, ... gnr JM GNR

Federated Strategies:

Method Type Fed_strategy(code) Reference
Local non-federated local -
FedMean traditional fedmean -
FedEM traditional fedem FedEM
FedMICE traditional fedmice FedMICE
FedTree traditional fedtree FedTree
FedAvg global FL fedavg FedAvg
FedProx global FL fedprox FedProx
Scaffold global FL scaffold Scaffold
FedAdam global FL fedadam FedAdam
FedAdagrad global FL fedadagrad FedAdaGrad
FedYogi global FL fedyogi FedYogi
FedAvg-FT personalized FL fedavg_ft FedAvg-FT

Environment Setup

After configuring environment, we need to initialize the environment - initialize Clients, Server objects with simulated data from simulation module. Currently, the FedImputeEnv class supports the two ways to set up the environment. First way is to directly setup the environment from simulator class by using env.setup_from_simulator(simulator) method.

env.setup_from_simulator(simulator, verbose=1)

The second way is to setup the environment by using env.setup_from_data() method. It can be used in the scenario where user have their own data that not simulated from simulator class. Example:

import numpy as np

clients_train_data = [np.random.rand(100, 10) for _ in range(10)]
clients_train_data_ms = [np.random.rand(100, 10) for _ in range(10)]
clients_test_data = [np.random.rand(100, 10) for _ in range(10)]
global_test = np.random.rand(100, 10)
data_config = {
    'target': 9,
    'task_type': 'regression',
    'clf_type': None,
    'num_cols': 9,
}
clients_seeds = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

env.setup_from_data(
    clients_train_data, clients_test_data, clients_train_data_ms, clients_seeds, global_test, data_config, verbose=1
)

Execute Federated Imputation

After setting up the environment, we can execute the federated imputation algorithms using run_fed_imputation() method. Currently, we support two types of simulation execution (1) Run FL in sequential mode (run_type="sequential"), in this model, there is no parallel, whole processes of imputation for clients run sequantially by using for loop (2) Run federated imputation in parallel mode (run_type="parallel"), it will simulate different processes for clients and server and then using workflow to manage communication between clients and server to approach the real world FL environment.

env.run_fed_imputation(run_type='squential')

Monitoring Imputation Process

We provide the Tensorboard utilty so that user can monitoring the imputation progress in real time. By using tensorboard, user need to run the following command in the terminal:

tensorboard --logdir .logs

We also provide another API env.tracker.visualize_imputation_process() it will show the line chart of imputation process measured by imputation quality or loss, it can only be run after imputation finished unlike tensorboard utility.

Develop New Federated Imputation Methods

FedImputeEnv class supports functionality to register new imputers, strategies and workflows. User can develope their new imputation methods, federated strategies and workflow. Basically, these three components have to work tightly to formalize a federated imputation algorithm.

To develop new workflow, user need to implement a new workflow by inherit the Workflow class from fedimpute.execution_environment.workflow. In implementation of workflow, user need to think how to allow clients and server to interact with each other.

To develop new federated strategies, user need to implement new strategy for both client and server.

To develop new imputers, user need to implement new imputer by inherit the one of two abstract class BaseMLImputer and BaseNNImputer from fedimpute.execution_environment.imputation.base, one for traditional methods and another for generative model based methods, and implement all its abstract methods (interfaces). Because, each imputer is associated with federated strategy and workflow, user need also to think how to make developed new imputer to be compatible with existed or new federated strategy and workflows.

Then user can use env.register.register_imputer, env.register.register_strategy, env.register.register_workflow to register these new developed classes. After registeration, user can use them in FedImputeEnv following the same way as using built-in methods. We provided a detailed example in tutorials.

Miscellaneous

  • verbose (int) - Verbosity level. 0: no output, 1: minimal output, 2: detailed output
  • seed (int) - Seed for reproducibility
  • logging (bool) - Whether to log the training process