Simulator

fedimpute.scenario.scenario_builder.ScenarioBuilder

class ScenarioBuilder(debug_mode: bool = False)

ScenarioBuilder class for simulating or constructing missing data scenarios in federated learning environment

Attributes

  • data : np.ndarray — data to be used for simulation

  • data_config : dict — data configuration dictionary

  • clients_train_data : List[np.ndarray] — list of clients training data

  • clients_test_data : List[np.ndarray] — list of clients test data

  • clients_train_data_ms : List[np.ndarray] — list of clients training data with missing values

  • global_test : np.ndarray — global test data

  • client_seeds : List[int] — list of seeds for clients

  • stats : dict — simulation statistics

  • debug_mode : bool — whether to enable debug mode

Methods

  • create_simulated_scenario — Simulate missing data scenario

  • create_simulated_scenario_lite — Simulate missing data scenario

  • create_real_scenario — Create a real scenario from a list of pandas DataFrames

  • save

  • load

  • export_data

  • summarize_scenario

  • show_missing_data_details

  • visualize_missing_pattern

  • visualize_missing_distribution

  • visualize_data_heterogeneity

Build Simulated Scenario Function

fedimpute.scenario.scenario_builder.ScenarioBuilder.create_simulated_scenario

method ScenarioBuilder.create_simulated_scenario(data: Union[np.array, pd.DataFrame], data_config: dict, num_clients: int, dp_strategy: str = 'iid-even', dp_split_cols: Union[str, int] = 'target', dp_min_samples: Union[float, int] = 50, dp_max_samples: Union[float, int] = 2000, dp_sample_iid_direct: bool = False, dp_local_test_size: float = 0.1, dp_global_test_size: float = 0.1, dp_local_backup_size: float = 0.05, dp_reg_bins: int = 50, ms_scenario: str = None, ms_cols: Union[str, List[int]] = 'all', obs_cols: Union[str, List[int]] = 'random', ms_mech_type: str = 'mcar', ms_global_mechanism: bool = False, ms_mr_dist_clients: str = 'randu', ms_mf_dist_clients: str = 'identity', ms_mm_dist_clients: str = 'random', ms_missing_features: str = 'all', ms_mr_lower: float = 0.3, ms_mr_upper: float = 0.7, ms_mm_funcs_bank: str = 'lr', ms_mm_strictness: bool = True, ms_mm_obs: bool = False, ms_mm_feature_option: str = 'allk=0.2', ms_mm_beta_option: str = None, seed: int = 100330201, verbose: int = 0)Dict[str, List[np.ndarray]]

Simulate missing data scenario

Parameters

  • data : Union[np.array, pd.DataFrame] — data to be used for simulation

  • data_config : dict — data configuration dictionary

  • num_clients : int — number of clients

  • dp_strategy : str — data partition strategy, default: 'iid-even' - iid-even, iid-dir, niid-dir, niid-path

  • dp_split_cols : Union[str, int, List[int]] — split columns option - target, feature, default: target

  • dp_min_samples : Union[float, int] — minimum samples for clients, default: 50

  • dp_max_samples : Union[float, int] — maximum samples for clients, default: 2000

  • dp_sample_iid_direct : bool — sample iid data directly, default: False

  • dp_local_test_size : float — local test size ratio, default: 0.1

  • dp_global_test_size : float — global test size ratio, default: 0.1

  • dp_local_backup_size : float — local backup size ratio, default: 0.05

  • dp_reg_bins : int — regression bins, default: 50

  • ms_mech_type : str — missing mechanism type, default: 'mcar' - mcar, mar_sigmoid, mnar_sigmoid, mar_quantile, mnar_quantile

  • ms_cols : Union[str, List[int]] — missing columns, default: 'all' - all, all-num, random

  • obs_cols : Union[str, List[int]] — fully observed columns for MAR, default: 'random' - random, rest

  • ms_global_mechanism : bool — global missing mechanism, default: False

  • ms_mr_dist_clients : str — missing ratio distribution, default: 'randu-int' - 'fixed', 'uniform', 'uniform_int', 'gaussian', 'gaussian_int'

  • ms_mf_dist_clients : str — missing features distribution, default: 'identity' - 'identity', 'random', 'random2'

  • ms_mm_dist_clients : str — missing mechanism functions distribution, default: 'random' - 'identity', 'random', 'random2'

  • ms_missing_features : str — missing features strategy, default: 'all' - 'all', 'all-num'

  • ms_mr_lower : float — minimum missing ratio for each feature, default: 0.3

  • ms_mr_upper : float — maximum missing ratio for each feature, default: 0.7

  • ms_mm_funcs_bank : str — missing mechanism functions banks, default: 'lr' - None, 'lr', 'mt', 'all'

  • ms_mm_strictness : bool — missing adding probabilistic or deterministic, default: True

  • ms_mm_obs : bool — missing adding based on observed data, default: False

  • ms_mm_feature_option : str — missing mechanism associated with which features, default: 'allk=0.2' - 'self', 'all', 'allk=0.1'

  • ms_mm_beta_option : str — mechanism beta coefficient option, default: None - (mnar) self, sphere, randu, (mar) fixed, randu, randn

  • seed : int — random seed, default: 100330201

  • verbose : int — whether verbose the simulation process, default: 0

Returns

  • Dict[str, List[np.ndarray]] — dictionary of clients training data, test data, training data with missing values, global test data

Raises

  • ValueError

  • NotImplementedError

Build from Real Federated Scenario Function

fedimpute.scenario.scenario_builder.ScenarioBuilder.create_real_scenario

method ScenarioBuilder.create_real_scenario(datas: List[pd.DataFrame], data_config: Dict, seed: int = 100330201, verbose: int = 0)

Create a real scenario from a list of pandas DataFrames

Parameters

  • datas : List[pd.DataFrame] — list of pandas DataFrames

  • data_config : Dict — data configuration dictionary

  • seed : int — random seed, default: 100330201

  • verbose : int — whether verbose the simulation process, default: 0

Raises

  • ValueError