bhepop2.enrichment

This package contains classes used to enrich synthetic populations, using various methodologies.

Submodules

Package Contents

Classes

Bhepop2Enrichment

Implementation of the Bhepop2 methodology as an enrichment class.

SimpleUniformEnrichment

This class implements a simple enrichment using a global distribution.

class bhepop2.enrichment.Bhepop2Enrichment(population: pandas.DataFrame, source: bhepop2.sources.marginal_distributions.MarginalDistributions, feature_name=None, seed=None)

Bases: bhepop2.enrichment.base.SyntheticPopulationEnrichment

Implementation of the Bhepop2 methodology as an enrichment class. See bhepop2.enrichment.bhepop2 module documentation for details about the algorithm.

Expected source types:


This class documentation uses the following notations:

  • \(M_{k}\) : crossed modality k (combination of attribute modalities)

  • \(F_{i}\) : feature class i

    • For quantitative features, corresponds to a numeric interval.

    • For qualitative features, corresponds to one of the feature values.

property modalities

Dict containing list of modalities for each attribute.

_validate_and_process_inputs()

Validate the provided inputs and set the relevant fields.

Since Bhepop2 uses marginal distributions to enrich the population, we ensure that:

  • the selected attributes are present in the population

  • the population attributes take values in the modalities corresponding to this attribute

_evaluate_feature_on_population()

Assign feature values to the population individuals using the algorithm results.

Returns:

enriched population DataFrame

_draw_feature_value(probs)

Return a feature value using the given probabilities.

First draw the feature index. Then get a feature value from the distributions.

Returns:

feature value to assign to individual

_get_feature_probs()

For each crossed modality, compute the probability of belonging to a feature interval.

Invert the crossed modality probabilities using Bayes.

Compute

\[P(f \in F_{i} \mid M_{k}) = P(M_{k} \mid f \in F_{i}) \cdot \frac{P(f \in F_{i})}{P(M_{k})}\]
Returns:

DataFrame

_optimise()

Run the optimisation algorithm to find the probability distributions that maximise entropy.

When done, set the optim_result attribute.

_run_optimization() pandas.DataFrame

Run optimization model on each feature value.

The resulting probabilities are the \(P(M_{k} \mid f \in F_{i})\).

Returns:

DataFrame containing the result probabilities

_compute_constraints()

For each modality of each attribute, compute the probability of belonging to each feature interval.

\[P(Modality \mid f \in F_{i}) = P(f \in F_{i} \mid Modality) \cdot \frac{P(Modality)}{P(f \in F_{i})}\]
_compute_crossed_modalities_matrix()

Compute crossed modalities matrix for the present modalities.

A reducted samplespace is evaluated from the crossed modalities present in the population. Functions describing each modality are then applied to elements of this samplespace.

For each modality m and sample c, M(m, c) is 1 if c has modality m, 0 otherwise.

Returns:

crossed_modalities_matrix describing crossed modalities

class bhepop2.enrichment.SimpleUniformEnrichment(population: pandas.DataFrame, source, feature_name: str = None, seed=None)

Bases: bhepop2.enrichment.base.SyntheticPopulationEnrichment

This class implements a simple enrichment using a global distribution.

Expected source types:


The global distribution describes the feature values of the whole population, using deciles (see global_distribution).

To evaluate a feature value for an individual, we randomly choose one of the deciles, and then draw a random value between its two boundaries.

This method ensures a good distribution of the feature values over the total population, but no more.

_evaluate_feature_on_population()

Evaluate a list of feature values for each individual.

Returns:

iterable with same size and order than the population

_draw_feature_value()
_validate_and_process_inputs()

Validate and process the provided enrichment inputs.

Both the population and the enrichment source may need to be validated.

Raise:

ValueError if validation fails