bhepop2.enrichment.bhepop2

Implementation of the Bhepop2 methodology.

Bhepop2 stands for Bayesian Heuristic to Enrich POPulation by EntroPy OPtimization. It is theoretically described, justified and discussed in

Diagram representation of the algorithm:

graph TD Pop[("Population <br /> with Attributes <br /> A,B")] -->Frequencies("Cross Modalities Frequencies <br /> F(A #8745; B)") Distribution("Distribution (Deciles) <br /> P(I #8712; [Id,Id+1] | A) <br /> P(I #8712; [Ik,Ik+1] | B)") --> Interpolation("Sorted deciles & Interpolation <br /> P(I | A) <br /> P(I | B)") Entropy(" Cross Entropy Optimizations <br /> P( (A #8745; B) | I ) <br /> under the constrains <br /> P(I | A) <br /> P(I | B)" ) Frequencies --> Entropy Interpolation --> Entropy Entropy --> Bayesian("Bayesian rule <br /> P( I | (A #8745; B)) = P( (A #8745; B) | I ) * P(I)/P(A #8745; B)") Bayesian ---> Cleaning("Cleaning <br />inconsistent to <br /> consistent proba ") Cleaning -->|Sampling|Sampling[("Population <br /> with Attributes <br /> A,B, I")]

Module Contents

Classes

Bhepop2Enrichment

Implementation of the Bhepop2 methodology as an enrichment class.

class bhepop2.enrichment.bhepop2.Bhepop2Enrichment(population: pandas.DataFrame, source: bhepop2.sources.marginal_distributions.MarginalDistributions, feature_name=None, seed=None)

Bases: bhepop2.enrichment.base.SyntheticPopulationEnrichment

Implementation of the Bhepop2 methodology as an enrichment class. See bhepop2.enrichment.bhepop2 module documentation for details about the algorithm.

Expected source types:


This class documentation uses the following notations:

  • \(M_{k}\) : crossed modality k (combination of attribute modalities)

  • \(F_{i}\) : feature class i

    • For quantitative features, corresponds to a numeric interval.

    • For qualitative features, corresponds to one of the feature values.

property modalities

Dict containing list of modalities for each attribute.

_validate_and_process_inputs()

Validate the provided inputs and set the relevant fields.

Since Bhepop2 uses marginal distributions to enrich the population, we ensure that:

  • the selected attributes are present in the population

  • the population attributes take values in the modalities corresponding to this attribute

_evaluate_feature_on_population()

Assign feature values to the population individuals using the algorithm results.

Returns:

enriched population DataFrame

_draw_feature_value(probs)

Return a feature value using the given probabilities.

First draw the feature index. Then get a feature value from the distributions.

Returns:

feature value to assign to individual

_get_feature_probs()

For each crossed modality, compute the probability of belonging to a feature interval.

Invert the crossed modality probabilities using Bayes.

Compute

\[P(f \in F_{i} \mid M_{k}) = P(M_{k} \mid f \in F_{i}) \cdot \frac{P(f \in F_{i})}{P(M_{k})}\]
Returns:

DataFrame

_optimise()

Run the optimisation algorithm to find the probability distributions that maximise entropy.

When done, set the optim_result attribute.

_run_optimization() pandas.DataFrame

Run optimization model on each feature value.

The resulting probabilities are the \(P(M_{k} \mid f \in F_{i})\).

Returns:

DataFrame containing the result probabilities

_compute_constraints()

For each modality of each attribute, compute the probability of belonging to each feature interval.

\[P(Modality \mid f \in F_{i}) = P(f \in F_{i} \mid Modality) \cdot \frac{P(Modality)}{P(f \in F_{i})}\]
_compute_crossed_modalities_matrix()

Compute crossed modalities matrix for the present modalities.

A reducted samplespace is evaluated from the crossed modalities present in the population. Functions describing each modality are then applied to elements of this samplespace.

For each modality m and sample c, M(m, c) is 1 if c has modality m, 0 otherwise.

Returns:

crossed_modalities_matrix describing crossed modalities