bhepop2.enrichment
This package contains classes used to enrich synthetic populations, using various methodologies.
Submodules
Package Contents
Classes
Implementation of the Bhepop2 methodology as an enrichment class. |
|
This class implements a simple enrichment using a global distribution. |
- class bhepop2.enrichment.Bhepop2Enrichment(population: pandas.DataFrame, source: bhepop2.sources.marginal_distributions.MarginalDistributions, feature_name=None, seed=None)
Bases:
bhepop2.enrichment.base.SyntheticPopulationEnrichment
Implementation of the Bhepop2 methodology as an enrichment class. See
bhepop2.enrichment.bhepop2
module documentation for details about the algorithm.Expected source types:
This class documentation uses the following notations:
\(M_{k}\) : crossed modality k (combination of attribute modalities)
\(F_{i}\) : feature class i
For quantitative features, corresponds to a numeric interval.
For qualitative features, corresponds to one of the feature values.
- property modalities
Dict containing list of modalities for each attribute.
- _evaluate_feature_on_population()
Assign feature values to the population individuals using the algorithm results.
- Returns:
enriched population DataFrame
- _draw_feature_value(probs)
Return a feature value using the given probabilities.
First draw the feature index. Then get a feature value from the distributions.
- Returns:
feature value to assign to individual
- _get_feature_probs()
For each crossed modality, compute the probability of belonging to a feature interval.
Invert the crossed modality probabilities using Bayes.
Compute
\[P(f \in F_{i} \mid M_{k}) = P(M_{k} \mid f \in F_{i}) \cdot \frac{P(f \in F_{i})}{P(M_{k})}\]- Returns:
DataFrame
- _optimise()
Run the optimisation algorithm to find the probability distributions that maximise entropy.
When done, set the optim_result attribute.
- _run_optimization() pandas.DataFrame
Run optimization model on each feature value.
The resulting probabilities are the \(P(M_{k} \mid f \in F_{i})\).
- Returns:
DataFrame containing the result probabilities
- _compute_constraints()
For each modality of each attribute, compute the probability of belonging to each feature interval.
\[P(Modality \mid f \in F_{i}) = P(f \in F_{i} \mid Modality) \cdot \frac{P(Modality)}{P(f \in F_{i})}\]
- _compute_crossed_modalities_matrix()
Compute crossed modalities matrix for the present modalities.
A reducted samplespace is evaluated from the crossed modalities present in the population. Functions describing each modality are then applied to elements of this samplespace.
For each modality m and sample c, M(m, c) is 1 if c has modality m, 0 otherwise.
- Returns:
crossed_modalities_matrix describing crossed modalities
- class bhepop2.enrichment.SimpleUniformEnrichment(population: pandas.DataFrame, source, feature_name: str = None, seed=None)
Bases:
bhepop2.enrichment.base.SyntheticPopulationEnrichment
This class implements a simple enrichment using a global distribution.
Expected source types:
The global distribution describes the feature values of the whole population, using deciles (see
global_distribution
).To evaluate a feature value for an individual, we randomly choose one of the deciles, and then draw a random value between its two boundaries.
This method ensures a good distribution of the feature values over the total population, but no more.
- _evaluate_feature_on_population()
Evaluate a list of feature values for each individual.
- Returns:
iterable with same size and order than the population
- _draw_feature_value()