Executing Distributed Imputation Algorithms
The FedImputeEnv
class is the execution_environment
module's main class.
It is used to configure the federated imputation environment and execute federated imputation algorithms.
Overview and Basic Usage
Use needs to initialize the FedImputeEnv
class and configure the environment using the configuration
method - what imputer to use, what federated strategy to use, and what fitting mode to use.
Then, use the setup_from_simulator
method to set up the environment using the simulated data from simulator
class,
see Scenario Simulation Section.
Finally, use the run_fed_imputation
method to execute the federated imputation algorithms.
from fedimpute.execution_environment import FedImputeEnv
env = FedImputeEnv(debug_mode=False)
env.configuration(imputer = 'mice', fed_strategy='fedmice')
env.setup_from_scenario_builder(scenario_builder = scenario_builder, verbose=1)
env.show_env_info()
env.run_fed_imputation()
Note that if you use cuda version of torch, remember to set environment variable for cuda deterministic behavior first
# bash (linux)
export CUBLAS_WORKSPACE_CONFIG=:4096:8
# powershell (windows)
$Env:CUBLAS_WORKSPACE_CONFIG = ":4096:8"
Environment Configuration
The env.configuration()
method is used to configure the environment. It takes the following arguments:
Options:
- imputer (str) - name of imputation algorithm to use. Options:
fed_mean
,fed_em
,fed_ice
,fed_missforest
,gain
,miwae
- fed_strategy (str) - name of federated strategy to use. Options:
fedavg
,fedprox
,scaffold
,fedavg_ft
- fit_mode (str) - name of fitting mode to use - federated imputation, local-only imputation or centralized imputation. Options:
fed
,local
,central
- save_dir_path (str) - path to persist clients and server training process information (imputation models, imputed data etc.) for future use.
Other Params:
- imputer_params (Union[None, dict]) = None - parameters for imputer
- fed_strategy_params (Union[None, dict]) = None - parameters for federated strategy
- workflow_params (Union[None, dict]) = None - parameters for workflow -
Workflow
class contains the logic for federated imputation workflow. It is associated with eachImputer
class. - The built-in workflows are:
ice
- for ICE based imputation,em
- for EM imputation,jm
- for joint modeling based imputation such as VAE or GAN based imputation.
Supported Federated Imputation Algorithms
Federated Imputation Algorithms:
Method | Type | Fed Strategy | Imputer (code) | Workflow | Reference |
---|---|---|---|---|---|
Mean | Non-NN | local , fedmean |
mean |
MEAN |
- |
EM | Non-NN | local , fedem |
em |
EM |
EM, FedEM |
MICE | Non-NN | local , fedmice |
mice |
ICE |
FedICE |
MissForest | Non-NN | local , fedtree |
missforest |
ICE |
MissForest, Fed Randomforest |
MIWAE | NN | local , fedavg , ... |
miwae |
JM |
MIWAE |
GAIN | NN | local , fedavg , ... |
gain |
JM |
GAIN |
Not-MIWAE | NN | local , fedavg , ... |
notmiwae |
JM |
Not-MIWAE |
GNR | NN | local , fedavg , ... |
gnr |
JM |
GNR |
Federated Strategies:
Method | Type | Fed_strategy(code) | Reference |
---|---|---|---|
Local | non-federated | local |
- |
FedMean | traditional | fedmean |
- |
FedEM | traditional | fedem |
FedEM |
FedMICE | traditional | fedmice |
FedMICE |
FedTree | traditional | fedtree |
FedTree |
FedAvg | global FL | fedavg |
FedAvg |
FedProx | global FL | fedprox |
FedProx |
Scaffold | global FL | scaffold |
Scaffold |
FedAdam | global FL | fedadam |
FedAdam |
FedAdagrad | global FL | fedadagrad |
FedAdaGrad |
FedYogi | global FL | fedyogi |
FedYogi |
FedAvg-FT | personalized FL | fedavg_ft |
FedAvg-FT |
Environment Setup
After configuring environment, we need to initialize the environment - initialize Client
s, Server
objects with simulated data from simulation module.
Currently, the FedImputeEnv
class supports the two ways to set up the environment. First way is to directly setup the environment from simulator
class by using env.setup_from_simulator(simulator)
method.
env.setup_from_simulator(simulator, verbose=1)
The second way is to setup the environment by using env.setup_from_data()
method. It can be used
in the scenario where user have their own data that not simulated from simulator class. Example:
import numpy as np
clients_train_data = [np.random.rand(100, 10) for _ in range(10)]
clients_train_data_ms = [np.random.rand(100, 10) for _ in range(10)]
clients_test_data = [np.random.rand(100, 10) for _ in range(10)]
global_test = np.random.rand(100, 10)
data_config = {
'target': 9,
'task_type': 'regression',
'clf_type': None,
'num_cols': 9,
}
clients_seeds = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
env.setup_from_data(
clients_train_data, clients_test_data, clients_train_data_ms, clients_seeds, global_test, data_config, verbose=1
)
Execute Federated Imputation
After setting up the environment, we can execute the federated imputation algorithms using run_fed_imputation()
method. Currently, we support two types of simulation execution (1) Run FL in sequential mode (run_type="sequential"
), in this model, there is no parallel, whole processes of imputation for clients run sequantially by using for loop (2) Run federated imputation in parallel mode (run_type="parallel"
), it will simulate different processes for clients and server and then using workflow to manage communication between clients and server to approach the real world FL environment.
env.run_fed_imputation(run_type='squential')
Monitoring Imputation Process
We provide the Tensorboard utilty so that user can monitoring the imputation progress in real time. By using tensorboard, user need to run the following command in the terminal:
tensorboard --logdir .logs
We also provide another API env.tracker.visualize_imputation_process()
it will show the line chart of imputation process measured by imputation quality or loss, it can only be run after imputation finished unlike tensorboard utility.
Develop New Federated Imputation Methods
FedImputeEnv
class supports functionality to register new imputers, strategies and workflows. User can develope their new imputation methods, federated strategies and workflow. Basically, these three components have to work tightly to formalize a federated imputation algorithm.
To develop new workflow, user need to implement a new workflow by inherit the Workflow
class from fedimpute.execution_environment.workflow
. In implementation of workflow, user need to think how to allow clients and server to interact with each other.
To develop new federated strategies, user need to implement new strategy for both client and server.
To develop new imputers, user need to implement new imputer by inherit the one of two abstract class BaseMLImputer
and BaseNNImputer
from fedimpute.execution_environment.imputation.base
, one for traditional methods and another for generative model based methods, and implement all its abstract methods (interfaces). Because, each imputer is associated with federated strategy and workflow, user need also to think how to make developed new imputer to be compatible with existed or new federated strategy and workflows.
Then user can use env.register.register_imputer
, env.register.register_strategy
, env.register.register_workflow
to register these new developed classes. After registeration, user can use them in FedImputeEnv
following the same way as using built-in methods. We provided a detailed example in tutorials.
Miscellaneous
- verbose (int) - Verbosity level. 0: no output, 1: minimal output, 2: detailed output
- seed (int) - Seed for reproducibility
- logging (bool) - Whether to log the training process