Simulator
fedimpute.scenario.scenario_builder.ScenarioBuilder
class ScenarioBuilder(debug_mode: bool = False)
ScenarioBuilder class for simulating or constructing missing data scenarios in federated learning environment
Attributes
-
data : np.ndarray — data to be used for simulation
-
data_config : dict — data configuration dictionary
-
clients_train_data : List[np.ndarray] — list of clients training data
-
clients_test_data : List[np.ndarray] — list of clients test data
-
clients_train_data_ms : List[np.ndarray] — list of clients training data with missing values
-
global_test : np.ndarray — global test data
-
client_seeds : List[int] — list of seeds for clients
-
stats : dict — simulation statistics
-
debug_mode : bool — whether to enable debug mode
Methods
-
create_simulated_scenario — Simulate missing data scenario
-
create_simulated_scenario_lite — Simulate missing data scenario
-
create_real_scenario — Create a real scenario from a list of pandas DataFrames
-
save
-
load
-
export_data
-
summarize_scenario
-
show_missing_data_details
-
visualize_missing_pattern
-
visualize_missing_distribution
-
visualize_data_heterogeneity
Build Simulated Scenario Function
fedimpute.scenario.scenario_builder.ScenarioBuilder.create_simulated_scenario
method ScenarioBuilder.create_simulated_scenario(data: Union[np.array, pd.DataFrame], data_config: dict, num_clients: int, dp_strategy: str = 'iid-even', dp_split_cols: Union[str, int] = 'target', dp_min_samples: Union[float, int] = 50, dp_max_samples: Union[float, int] = 2000, dp_sample_iid_direct: bool = False, dp_local_test_size: float = 0.1, dp_global_test_size: float = 0.1, dp_local_backup_size: float = 0.05, dp_reg_bins: int = 50, ms_scenario: str = None, ms_cols: Union[str, List[int]] = 'all', obs_cols: Union[str, List[int]] = 'random', ms_mech_type: str = 'mcar', ms_global_mechanism: bool = False, ms_mr_dist_clients: str = 'randu', ms_mf_dist_clients: str = 'identity', ms_mm_dist_clients: str = 'random', ms_missing_features: str = 'all', ms_mr_lower: float = 0.3, ms_mr_upper: float = 0.7, ms_mm_funcs_bank: str = 'lr', ms_mm_strictness: bool = True, ms_mm_obs: bool = False, ms_mm_feature_option: str = 'allk=0.2', ms_mm_beta_option: str = None, seed: int = 100330201, verbose: int = 0) → Dict[str, List[np.ndarray]]
Simulate missing data scenario
Parameters
-
data : Union[np.array, pd.DataFrame] — data to be used for simulation
-
data_config : dict — data configuration dictionary
-
num_clients : int — number of clients
-
dp_strategy : str — data partition strategy, default: 'iid-even' -
iid-even
,iid-dir
,niid-dir
,niid-path
-
dp_split_cols : Union[str, int, List[int]] — split columns option -
target
,feature
, default:target
-
dp_min_samples : Union[float, int] — minimum samples for clients, default: 50
-
dp_max_samples : Union[float, int] — maximum samples for clients, default: 2000
-
dp_sample_iid_direct : bool — sample iid data directly, default: False
-
dp_local_test_size : float — local test size ratio, default: 0.1
-
dp_global_test_size : float — global test size ratio, default: 0.1
-
dp_local_backup_size : float — local backup size ratio, default: 0.05
-
dp_reg_bins : int — regression bins, default: 50
-
ms_mech_type : str — missing mechanism type, default: 'mcar' -
mcar
,mar_sigmoid
,mnar_sigmoid
,mar_quantile
,mnar_quantile
-
ms_cols : Union[str, List[int]] — missing columns, default: 'all' -
all
,all-num
,random
-
obs_cols : Union[str, List[int]] — fully observed columns for MAR, default: 'random' -
random
,rest
-
ms_global_mechanism : bool — global missing mechanism, default: False
-
ms_mr_dist_clients : str — missing ratio distribution, default: 'randu-int' - 'fixed', 'uniform', 'uniform_int', 'gaussian', 'gaussian_int'
-
ms_mf_dist_clients : str — missing features distribution, default: 'identity' - 'identity', 'random', 'random2'
-
ms_mm_dist_clients : str — missing mechanism functions distribution, default: 'random' - 'identity', 'random', 'random2'
-
ms_missing_features : str — missing features strategy, default: 'all' - 'all', 'all-num'
-
ms_mr_lower : float — minimum missing ratio for each feature, default: 0.3
-
ms_mr_upper : float — maximum missing ratio for each feature, default: 0.7
-
ms_mm_funcs_bank : str — missing mechanism functions banks, default: 'lr' - None, 'lr', 'mt', 'all'
-
ms_mm_strictness : bool — missing adding probabilistic or deterministic, default: True
-
ms_mm_obs : bool — missing adding based on observed data, default: False
-
ms_mm_feature_option : str — missing mechanism associated with which features, default: 'allk=0.2' - 'self', 'all', 'allk=0.1'
-
ms_mm_beta_option : str — mechanism beta coefficient option, default: None - (mnar) self, sphere, randu, (mar) fixed, randu, randn
-
seed : int — random seed, default: 100330201
-
verbose : int — whether verbose the simulation process, default: 0
Returns
-
Dict[str, List[np.ndarray]] — dictionary of clients training data, test data, training data with missing values, global test data
Raises
-
ValueError
-
NotImplementedError
Build from Real Federated Scenario Function
fedimpute.scenario.scenario_builder.ScenarioBuilder.create_real_scenario
method ScenarioBuilder.create_real_scenario(datas: List[pd.DataFrame], data_config: Dict, seed: int = 100330201, verbose: int = 0)
Create a real scenario from a list of pandas DataFrames
Parameters
-
datas : List[pd.DataFrame] — list of pandas DataFrames
-
data_config : Dict — data configuration dictionary
-
seed : int — random seed, default: 100330201
-
verbose : int — whether verbose the simulation process, default: 0
Raises
-
ValueError