:py:mod:`bhepop2.sources.marginal_distributions` ================================================ .. py:module:: bhepop2.sources.marginal_distributions .. autoapi-nested-parse:: This module contains classes describing marginal distributions sources. In this scope, specific source distributions are known for population subsets. This allows a more precise feature value association than a global, population wide distribution. Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: bhepop2.sources.marginal_distributions.MarginalDistributions bhepop2.sources.marginal_distributions.QualitativeMarginalDistributions bhepop2.sources.marginal_distributions.QuantitativeMarginalDistributions Attributes ~~~~~~~~~~ .. autoapisummary:: bhepop2.sources.marginal_distributions.ALL_LABEL .. py:data:: ALL_LABEL :value: 'all' .. py:class:: MarginalDistributions(data, name=None, attribute_selection: list = None) Bases: :py:obj:`bhepop2.sources.base.EnrichmentSource` Abstract class describing marginal distributions source. In this class, the distributions subsets are known for population individuals presenting a specific attribute. For instance, the Filosofi data source (INSEE) stores distributions of declared income in administrative areas, for the whole population and for population subsets, such as tenants or owners. In this scope, we use the following terms to describe such marginal distributions: - An **attribute** refers to an information in the initial sample or in the aggregate data. For instance: age, profession, ownership, etc. - **Modalities** are the partition of one attribute. For instance, in Filosofi, the *ownership* attribute can take the values *Owner* and *Tenant*. - **Cross modalities** are the intersection of two or more modalities. For instance, *Owner* and *above 65 years old*. Then, population individuals are part of a single cross modality, and can be matched with distributions corresponding to their known attributes. .. py:method:: _validate_data() Validate the source data. Raise a ValueError if data is invalid. :raises: SourceValidationError .. py:method:: usable_with_population(population) Check that the population attributes are compatible with the source. Check that the source attributes are present in the population. Check that the population values of each attribute are in the source distributions. :param population: population DataFrame :raises: PopulationValidationError .. py:method:: _validate_data_type() :abstractmethod: .. py:method:: compute_feature_prob(attribute=ALL_LABEL, modality=ALL_LABEL) :abstractmethod: Return a DataFrame containing the probability to be in each feature state while in the given modality. The resulting DataFrame is of the following format: { "feature": [feature_values], "prob": [feature_probs] } This method accepts attributes and modalities from self.modalities and also (ALL_LABEL, ALL_LABEL) couple, returning the global distribution. :param attribute: attribute label :param modality: modality label :return: DataFrame["feature", "prob"] .. py:method:: get_modality_distribution(attribute, modality) Get the distribution corresponding to the given attribute and modality. This method accepts attributes and modalities from self.modalities and also (ALL_LABEL, ALL_LABEL) couple, returning the global distribution. :param attribute: attribute label :param modality: modality label :return: .. py:class:: QualitativeMarginalDistributions(data, name=None, attribute_selection: list = None) Bases: :py:obj:`MarginalDistributions` Marginal distributions describing qualitative features. **Input data**: DataFrame with feature values as columns, and probabilities as column values, for each attribute/modality pair. An additional row containing a global distribution (for the whole population) must be present, with attribute and modality equal to :attr:`~bhepop2.sources.marginal_distributions.ALL_LABEL`. **Example**: .. list-table:: Table containing qualitative marginal distributions for attributes **ownership** and **age** :widths: 10 10 10 20 20 :header-rows: 1 * - Red - Green - Blue - attribute - modality * - 0.3 - 0.3 - 0.4 - all - all * - 0.5 - 0.2 - 0.3 - ownership - Owner * - 0.4 - 0.4 - 0.2 - ownership - Tenant * - 0 - 0.5 - 0.5 - age - 0_29 * - ... - ... - ... - ... - ... * - 0.7 - 0.1 - 0.2 - age - 75_or_more .. py:method:: _evaluate_feature_values() Evaluate the feature values from the distributions columns. :return: list of feature values .. py:method:: _validate_data_type() .. py:method:: compute_feature_prob(attribute=ALL_LABEL, modality=ALL_LABEL) Return a DataFrame containing the probability to be in each feature state while in the given modality. The resulting DataFrame is of the following format: { "feature": [feature_values], "prob": [feature_probs] } This method accepts attributes and modalities from self.modalities and also (ALL_LABEL, ALL_LABEL) couple, returning the global distribution. :param attribute: attribute label :param modality: modality label :return: DataFrame["feature", "prob"] .. py:method:: get_value_for_feature(feature_index, rng) Return a feature value for the given feature index. Generate a singular value from the feature state corresponding to the given index. :param feature_index: index of the feature in self.feature_values :param rng: Numpy random Generator :return: feature value .. py:method:: compare_with_populations(populations, feature_name, **kwargs) Compare the source data with populations containing the described feature (enriched or original) The class returns an instance of a PopulationAnalysis subclass, which can be used to generate different kinds of comparisons between the populations and the source data. :param populations: dict of populations {population_name: population} :param feature_name: population column containing the feature values :param kwargs: additional arguments for the analysis instance :return: PopulationAnalysis subclass instance. .. py:class:: QuantitativeMarginalDistributions(data, name=None, attribute_selection: list = None, abs_minimum: int = 0, relative_maximum: float = 1.5, delta_min: int = None) Bases: :py:obj:`MarginalDistributions`, :py:obj:`bhepop2.sources.base.QuantitativeAttributes` Marginal distributions describing quantitative features. **Input data**: DataFrame with deciles numbers as columns (D1, D2 to D9), and values as column values, for each attribute/modality pair. An additional row containing a global distribution (for the whole population) must be present, with attribute and modality equal to :attr:`~bhepop2.sources.marginal_distributions.ALL_LABEL`. **Example**: .. list-table:: Table containing quantitative marginal distributions for attributes **ownership** and **age** :widths: 25 10 25 40 40 :header-rows: 1 * - D1 - ... - D9 - attribute - modality * - 18 852 - ... - 46 522 - all - all * - 16 542 - ... - 50 060 - ownership - Owner * - 8 764 - ... - 29 860 - ownership - Tenant * - 15 000 - ... - 45 000 - age - 0_29 * - ... - ... - ... - ... - ... * - 20 000 - ... - 65 000 - age - 75_or_more .. py:method:: _evaluate_feature_values() Evaluate the feature values from the distribution values and class parameters. :return: list of feature values .. py:method:: _validate_data_type() .. py:method:: compute_feature_prob(attribute=ALL_LABEL, modality=ALL_LABEL) Return a DataFrame containing the probability to be in each feature state while in the given modality. The resulting DataFrame is of the following format: { "feature": [feature_values], "prob": [feature_probs] } This method accepts attributes and modalities from self.modalities and also (ALL_LABEL, ALL_LABEL) couple, returning the global distribution. :param attribute: attribute label :param modality: modality label :return: DataFrame["feature", "prob"] .. py:method:: get_value_for_feature(feature_index, rng) Return a value drawn from the interval corresponding to the feature index. The first interval is defined as [self._abs_minimum, self.feature_values[0]]. and so on. The value is drawn using a uniform rule. :param feature_index: :param rng: :return: .. py:method:: compare_with_populations(populations, feature_name, **kwargs) Compare the source data with populations containing the described feature (enriched or original) The class returns an instance of a PopulationAnalysis subclass, which can be used to generate different kinds of comparisons between the populations and the source data. :param populations: dict of populations {population_name: population} :param feature_name: population column containing the feature values :param kwargs: additional arguments for the analysis instance :return: PopulationAnalysis subclass instance.