Implementation of the Bhepop2 methodology.

Bhepop2 stands for Bayesian Heuristic to Enrich POPulation by EntroPy OPtimization. It is theoretically described, justified and discussed in

Diagram representation of the algorithm:

graph TD Pop[("Population <br /> with Attributes <br /> A,B")] -->Frequencies("Cross Modalities Frequencies <br /> F(A #8745; B)") Distribution("Distribution (Deciles) <br /> P(I #8712; [Id,Id+1] | A) <br /> P(I #8712; [Ik,Ik+1] | B)") --> Interpolation("Sorted deciles & Interpolation <br /> P(I | A) <br /> P(I | B)") Entropy(" Cross Entropy Optimizations <br /> P( (A #8745; B) | I ) <br /> under the constrains <br /> P(I | A) <br /> P(I | B)" ) Frequencies --> Entropy Interpolation --> Entropy Entropy --> Bayesian("Bayesian rule <br /> P( I | (A #8745; B)) = P( (A #8745; B) | I ) * P(I)/P(A #8745; B)") Bayesian ---> Cleaning("Cleaning <br />inconsistent to <br /> consistent proba ") Cleaning -->|Sampling|Sampling[("Population <br /> with Attributes <br /> A,B, I")]

Module Contents



Implementation of the Bhepop2 methodology as an enrichment class.

class bhepop2.enrichment.bhepop2.Bhepop2Enrichment(population: pandas.DataFrame, source: bhepop2.sources.marginal_distributions.MarginalDistributions, feature_name=None, seed=None)

Bases: bhepop2.enrichment.base.SyntheticPopulationEnrichment

Implementation of the Bhepop2 methodology as an enrichment class. See bhepop2.enrichment.bhepop2 module documentation for details about the algorithm.

Expected source types:

This class documentation uses the following notations:

  • \(M_{k}\) : crossed modality k (combination of attribute modalities)

  • \(F_{i}\) : feature class i

    • For quantitative features, corresponds to a numeric interval.

    • For qualitative features, corresponds to one of the feature values.

property modalities

Dict containing list of modalities for each attribute.


Validate the provided inputs and set the relevant fields.

Since Bhepop2 uses marginal distributions to enrich the population, we ensure that:

  • the selected attributes are present in the population

  • the population attributes take values in the modalities corresponding to this attribute


Assign feature values to the population individuals using the algorithm results.


enriched population DataFrame


Return a feature value using the given probabilities.

First draw the feature index. Then get a feature value from the distributions.


feature value to assign to individual


For each crossed modality, compute the probability of belonging to a feature interval.

Invert the crossed modality probabilities using Bayes.


\[P(f \in F_{i} \mid M_{k}) = P(M_{k} \mid f \in F_{i}) \cdot \frac{P(f \in F_{i})}{P(M_{k})}\]



Run the optimisation algorithm to find the probability distributions that maximise entropy.

When done, set the optim_result attribute.

_run_optimization() pandas.DataFrame

Run optimization model on each feature value.

The resulting probabilities are the \(P(M_{k} \mid f \in F_{i})\).


DataFrame containing the result probabilities


For each modality of each attribute, compute the probability of belonging to each feature interval.

\[P(Modality \mid f \in F_{i}) = P(f \in F_{i} \mid Modality) \cdot \frac{P(Modality)}{P(f \in F_{i})}\]

Compute crossed modalities matrix for the present modalities.

A reducted samplespace is evaluated from the crossed modalities present in the population. Functions describing each modality are then applied to elements of this samplespace.

For each modality m and sample c, M(m, c) is 1 if c has modality m, 0 otherwise.


crossed_modalities_matrix describing crossed modalities