`bhepop2.enrichment.bhepop2`

Implementation of the Bhepop2 methodology.

Bhepop2 stands for Bayesian Heuristic to Enrich POPulation by EntroPy OPtimization. It is theoretically described, justified and discussed in

Boyam Fabrice Yaméogo, Pierre-Olivier Vandanjon, Pierre Hankach, Pascal Gastineau. Methodology for Adding a Variable to a Synthetic Population from Aggregate Data: Example of the Income Variable. 2021. ⟨hal-03282111⟩. Paper in review.
Boyam Fabrice Yaméogo, Méthodologie de calibration d’un modèle multimodal des déplacements pour l’évaluation des externalités environnementales à partir de données ouvertes (open data) : le cas de l’aire urbaine de Nantes [Thèse], 2021

Diagram representation of the algorithm:

graph TD Pop[("Population with Attributes A,B")] -->Frequencies("Cross Modalities Frequencies F(A #8745; B)") Distribution("Distribution (Deciles) P(I #8712; [Id,Id+1] | A) P(I #8712; [Ik,Ik+1] | B)") --> Interpolation("Sorted deciles & Interpolation P(I | A) P(I | B)") Entropy(" Cross Entropy Optimizations P( (A #8745; B) | I ) under the constrains P(I | A) P(I | B)" ) Frequencies --> Entropy Interpolation --> Entropy Entropy --> Bayesian("Bayesian rule P( I | (A #8745; B)) = P( (A #8745; B) | I ) * P(I)/P(A #8745; B)") Bayesian ---> Cleaning("Cleaning inconsistent to consistent proba ") Cleaning -->|Sampling|Sampling[("Population with Attributes A,B, I")]

Module Contents

Classes

Bhepop2Enrichment

Implementation of the Bhepop2 methodology as an enrichment class.

class bhepop2.enrichment.bhepop2.Bhepop2Enrichment(population: pandas.DataFrame, source: bhepop2.sources.marginal_distributions.MarginalDistributions, feature_name=None, seed=None)

Bases: bhepop2.enrichment.base.SyntheticPopulationEnrichment

Implementation of the Bhepop2 methodology as an enrichment class. See bhepop2.enrichment.bhepop2 module documentation for details about the algorithm.

Expected source types:

This class documentation uses the following notations:

\(M_{k}\) : crossed modality k (combination of attribute modalities)
\(F_{i}\) : feature class i
- For quantitative features, corresponds to a numeric interval.
- For qualitative features, corresponds to one of the feature values.

property modalities: Dict containing list of modalities for each attribute.

_validate_and_process_inputs()

Validate the provided inputs and set the relevant fields.

Since Bhepop2 uses marginal distributions to enrich the population, we ensure that:

the selected attributes are present in the population
the population attributes take values in the modalities corresponding to this attribute

_evaluate_feature_on_population()

Assign feature values to the population individuals using the algorithm results.

Returns:: enriched population DataFrame

_draw_feature_value(probs)

Return a feature value using the given probabilities.

First draw the feature index. Then get a feature value from the distributions.

Returns:: feature value to assign to individual

_get_feature_probs()

For each crossed modality, compute the probability of belonging to a feature interval.

Invert the crossed modality probabilities using Bayes.

Compute

\[P(f \in F_{i} \mid M_{k}) = P(M_{k} \mid f \in F_{i}) \cdot \frac{P(f \in F_{i})}{P(M_{k})}\]

Returns:: DataFrame

_optimise()

Run the optimisation algorithm to find the probability distributions that maximise entropy.

When done, set the optim_result attribute.

_run_optimization() → pandas.DataFrame

Run optimization model on each feature value.

The resulting probabilities are the \(P(M_{k} \mid f \in F_{i})\).

Returns:: DataFrame containing the result probabilities

_compute_constraints(): For each modality of each attribute, compute the probability of belonging to each feature interval.

\[P(Modality \mid f \in F_{i}) = P(f \in F_{i} \mid Modality) \cdot \frac{P(Modality)}{P(f \in F_{i})}\]

_compute_crossed_modalities_matrix()

Compute crossed modalities matrix for the present modalities.

A reducted samplespace is evaluated from the crossed modalities present in the population. Functions describing each modality are then applied to elements of this samplespace.

For each modality m and sample c, M(m, c) is 1 if c has modality m, 0 otherwise.

Returns:: crossed_modalities_matrix describing crossed modalities

bhepop2.enrichment.bhepop2

Module Contents

Classes

`bhepop2.enrichment.bhepop2`