`bhepop2.enrichment`

This package contains classes used to enrich synthetic populations, using various methodologies.

Submodules

Package Contents

Classes

`Bhepop2Enrichment`	Implementation of the Bhepop2 methodology as an enrichment class.
`SimpleUniformEnrichment`	This class implements a simple enrichment using a global distribution.

class bhepop2.enrichment.Bhepop2Enrichment(population: pandas.DataFrame, source: bhepop2.sources.marginal_distributions.MarginalDistributions, feature_name=None, seed=None)

Bases: bhepop2.enrichment.base.SyntheticPopulationEnrichment

Implementation of the Bhepop2 methodology as an enrichment class. See bhepop2.enrichment.bhepop2 module documentation for details about the algorithm.

Expected source types:

This class documentation uses the following notations:

\(M_{k}\) : crossed modality k (combination of attribute modalities)
\(F_{i}\) : feature class i
- For quantitative features, corresponds to a numeric interval.
- For qualitative features, corresponds to one of the feature values.

property modalities: Dict containing list of modalities for each attribute.

_validate_and_process_inputs()

Validate the provided inputs and set the relevant fields.

Since Bhepop2 uses marginal distributions to enrich the population, we ensure that:

the selected attributes are present in the population
the population attributes take values in the modalities corresponding to this attribute

_evaluate_feature_on_population()

Assign feature values to the population individuals using the algorithm results.

Returns:: enriched population DataFrame

_draw_feature_value(probs)

Return a feature value using the given probabilities.

First draw the feature index. Then get a feature value from the distributions.

Returns:: feature value to assign to individual

_get_feature_probs()

For each crossed modality, compute the probability of belonging to a feature interval.

Invert the crossed modality probabilities using Bayes.

Compute

\[P(f \in F_{i} \mid M_{k}) = P(M_{k} \mid f \in F_{i}) \cdot \frac{P(f \in F_{i})}{P(M_{k})}\]

Returns:: DataFrame

_optimise()

Run the optimisation algorithm to find the probability distributions that maximise entropy.

When done, set the optim_result attribute.

_run_optimization() → pandas.DataFrame

Run optimization model on each feature value.

The resulting probabilities are the \(P(M_{k} \mid f \in F_{i})\).

Returns:: DataFrame containing the result probabilities

_compute_constraints(): For each modality of each attribute, compute the probability of belonging to each feature interval.

\[P(Modality \mid f \in F_{i}) = P(f \in F_{i} \mid Modality) \cdot \frac{P(Modality)}{P(f \in F_{i})}\]

_compute_crossed_modalities_matrix()

Compute crossed modalities matrix for the present modalities.

A reducted samplespace is evaluated from the crossed modalities present in the population. Functions describing each modality are then applied to elements of this samplespace.

For each modality m and sample c, M(m, c) is 1 if c has modality m, 0 otherwise.

Returns:: crossed_modalities_matrix describing crossed modalities

class bhepop2.enrichment.SimpleUniformEnrichment(population: pandas.DataFrame, source, feature_name: str = None, seed=None)

Bases: bhepop2.enrichment.base.SyntheticPopulationEnrichment

This class implements a simple enrichment using a global distribution.

Expected source types:

The global distribution describes the feature values of the whole population, using deciles (see global_distribution).

To evaluate a feature value for an individual, we randomly choose one of the deciles, and then draw a random value between its two boundaries.

This method ensures a good distribution of the feature values over the total population, but no more.

_evaluate_feature_on_population()

Evaluate a list of feature values for each individual.

Returns:: iterable with same size and order than the population

_draw_feature_value()

_validate_and_process_inputs()

Validate and process the provided enrichment inputs.

Both the population and the enrichment source may need to be validated.

Raise:: ValueError if validation fails

bhepop2.enrichment

Submodules

Package Contents

Classes

`bhepop2.enrichment`