bhepop2.enrichment.bhepop2
Implementation of the Bhepop2 methodology.
Bhepop2 stands for Bayesian Heuristic to Enrich POPulation by EntroPy OPtimization. It is theoretically described, justified and discussed in
Boyam Fabrice Yaméogo, Pierre-Olivier Vandanjon, Pierre Hankach, Pascal Gastineau. Methodology for Adding a Variable to a Synthetic Population from Aggregate Data: Example of the Income Variable. 2021. ⟨hal-03282111⟩. Paper in review.
Boyam Fabrice Yaméogo, Méthodologie de calibration d’un modèle multimodal des déplacements pour l’évaluation des externalités environnementales à partir de données ouvertes (open data) : le cas de l’aire urbaine de Nantes [Thèse], 2021
Diagram representation of the algorithm:
Module Contents
Classes
Implementation of the Bhepop2 methodology as an enrichment class. |
Functions
|
Create a function that checks if a sample belongs to the given attribute and modality. |
- class bhepop2.enrichment.bhepop2.Bhepop2Enrichment(population: pandas.DataFrame, source: bhepop2.sources.marginal_distributions.MarginalDistributions, feature_name=None, seed=None)
Bases:
bhepop2.enrichment.base.SyntheticPopulationEnrichment
Implementation of the Bhepop2 methodology as an enrichment class. See
bhepop2.enrichment.bhepop2
module documentation for details about the algorithm.Expected source types:
This class documentation uses the following notations:
\(M_{k}\) : crossed modality k (combination of attribute modalities)
\(F_{i}\) : feature class i
For quantitative features, corresponds to a numeric interval.
For qualitative features, corresponds to one of the feature values.
- property modalities
Dict containing list of modalities for each attribute.
- _evaluate_feature_on_population()
Assign feature values to the population individuals using the algorithm results.
- Returns:
enriched population DataFrame
- _draw_feature_value(probs)
Return a feature value using the given probabilities.
First draw the feature index. Then get a feature value from the distributions.
- Returns:
feature value to assign to individual
- _get_feature_probs()
For each crossed modality, compute the probability of belonging to a feature interval.
Invert the crossed modality probabilities using Bayes.
Compute
\[P(f \in F_{i} \mid M_{k}) = P(M_{k} \mid f \in F_{i}) \cdot \frac{P(f \in F_{i})}{P(M_{k})}\]- Returns:
DataFrame
- _optimise()
Run the optimisation algorithm to find the probability distributions that maximise entropy.
When done, set the optim_result attribute.
- _run_optimization() pandas.DataFrame
Run optimization model on each feature value.
The resulting probabilities are the \(P(M_{k} \mid f \in F_{i})\).
- Returns:
DataFrame containing the result probabilities
- _compute_constraints()
For each modality of each attribute, compute the probability of belonging to each feature interval.
\[P(Modality \mid f \in F_{i}) = P(f \in F_{i} \mid Modality) \cdot \frac{P(Modality)}{P(f \in F_{i})}\]
- _compute_crossed_modalities_matrix()
Compute crossed modalities matrix for the present modalities.
A reducted samplespace is evaluated from the crossed modalities present in the population. Functions describing each modality are then applied to elements of this samplespace.
For each modality m and sample c, M(m, c) is 1 if c has modality m, 0 otherwise.
- Returns:
crossed_modalities_matrix describing crossed modalities
- bhepop2.enrichment.bhepop2.modality_feature(attribute, modality) callable
Create a function that checks if a sample belongs to the given attribute and modality.
- Parameters:
attribute – attribute value
modality – modality value
- Returns:
feature checking function