Simulator

class ScenarioBuilder(debug_mode: bool = False)

ScenarioBuilder class for simulating or constructing missing data scenarios in federated learning environment

Attributes

  • data : np.ndarray data to be used for simulation

  • data_config : dict data configuration dictionary

  • clients_train_data : List[np.ndarray] list of clients training data

  • clients_test_data : List[np.ndarray] list of clients test data

  • clients_train_data_ms : List[np.ndarray] list of clients training data with missing values

  • global_test : np.ndarray global test data

  • client_seeds : List[int] list of seeds for clients

  • stats : dict simulation statistics

  • debug_mode : bool whether to enable debug mode

Methods

  • create_simulated_scenario Simulate missing data scenario

  • create_simulated_scenario_lite Simulate missing data scenario

  • create_real_scenario Create a real scenario from a list of pandas DataFrames

  • save

  • load

  • export_data

  • summarize_scenario

  • show_missing_data_details

  • visualize_missing_pattern

  • visualize_missing_distribution

  • visualize_data_heterogeneity

Build Simulated Scenario Function

method ScenarioBuilder.create_simulated_scenario(data: Union[np.array, pd.DataFrame], data_config: dict, num_clients: int, dp_strategy: str = 'iid-even', dp_split_cols: Union[str, int] = 'target', dp_min_samples: Union[float, int] = 50, dp_max_samples: Union[float, int] = 2000, dp_sample_iid_direct: bool = False, dp_local_test_size: float = 0.1, dp_global_test_size: float = 0.1, dp_local_backup_size: float = 0.05, dp_reg_bins: int = 50, ms_scenario: str = None, ms_cols: Union[str, List[int]] = 'all', obs_cols: Union[str, List[int]] = 'random', ms_mech_type: str = 'mcar', ms_global_mechanism: bool = False, ms_mr_dist_clients: str = 'randu', ms_mr_clients: Any = (0.3, 0.7), ms_mf_dist_clients: str = 'identity', ms_mm_dist_clients: str = 'random', ms_missing_features: str = 'all', ms_mr_lower: float = 0.1, ms_mr_upper: float = 0.9, ms_mm_funcs_bank: str = 'lr', ms_mm_strictness: bool = True, ms_mm_obs: bool = False, ms_mm_feature_option: str = 'allk=0.2', ms_mm_beta_option: str = None, seed: int = 100330201, verbose: int = 0)Dict[str, List[np.ndarray]]

Simulate missing data scenario

Parameters

  • data : Union[np.array, pd.DataFrame] data to be used for simulation

  • data_config : dict data configuration dictionary

  • num_clients : int number of clients

  • dp_strategy : str data partition strategy, default: 'iid-even'

    • iid-even, iid-dir, niid-dir, niid-path
  • dp_split_cols : Union[str, int, List[int]] split columns option

    • target, feature, default: target
  • dp_min_samples : Union[float, int] minimum samples for clients, default: 50

  • dp_max_samples : Union[float, int] maximum samples for clients, default: 2000

  • dp_sample_iid_direct : bool sample iid data directly, default: False

  • dp_local_test_size : float local test size ratio, default: 0.1

  • dp_global_test_size : float global test size ratio, default: 0.1

  • dp_local_backup_size : float local backup size ratio, default: 0.05

  • dp_reg_bins : int regression bins, default: 50

  • ms_mech_type : str missing mechanism type, default: 'mcar'

    • mcar, mar_sigmoid, mnar_sigmoid, mar_quantile, mnar_quantile
  • ms_cols : Union[str, List[int]] missing columns, default: 'all' - all, all-num, random

  • obs_cols : Union[str, List[int]] fully observed columns for MAR, default: 'random' - random, rest

  • ms_global_mechanism : bool global missing mechanism, default: False

  • ms_mr_dist_clients : str missing ratio distribution, default: 'random' - 'random', 'random-int', 'normal', 'normal-int'

  • ms_mr_clients : Any client-level missing ratio settings, default: (0.3, 0.7)

  • ms_mf_dist_clients : str missing features distribution, default: 'identity' - 'identity', 'random', 'random2'

  • ms_mm_dist_clients : str missing mechanism functions distribution, default: 'random' - 'identity', 'random', 'random2'

  • ms_missing_features : str missing features strategy, default: 'all' - 'all', 'all-num'

  • ms_mr_lower : float missing ratio lower clipping bound, default: 0.1

  • ms_mr_upper : float missing ratio upper clipping bound, default: 0.9

  • ms_mm_funcs_bank : str missing mechanism functions banks, default: 'lr' - None, 'lr', 'mt', 'all'

  • ms_mm_strictness : bool missing adding probabilistic or deterministic, default: True

  • ms_mm_obs : bool missing adding based on observed data, default: False

  • ms_mm_feature_option : str missing mechanism associated with which features, default: 'allk=0.2' - 'self', 'all', 'allk=0.1'

  • ms_mm_beta_option : str mechanism beta coefficient option, default: None - (mnar) self, sphere, randu, (mar) fixed, randu, randn

  • seed : int random seed, default: 100330201

  • verbose : int whether verbose the simulation process, default: 0

Returns

  • dict dictionary of clients training data, test data, training data with missing values, global test data

Raises

  • ValueError

  • NotImplementedError

Build from Real Federated Scenario Function

method ScenarioBuilder.create_real_scenario(datas: List[pd.DataFrame], data_config: Dict, seed: int = 100330201, verbose: int = 0)

Create a real scenario from a list of pandas DataFrames

Parameters

  • datas : List[pd.DataFrame] list of pandas DataFrames

  • data_config : Dict data configuration dictionary

  • seed : int random seed, default: 100330201

  • verbose : int whether verbose the simulation process, default: 0

Raises

  • ValueError