:py:mod:`bhepop2.sources.marginal_distributions`
================================================

.. py:module:: bhepop2.sources.marginal_distributions

.. autoapi-nested-parse::

   This module contains classes describing marginal distributions sources.

   In this scope, specific source distributions are known for population subsets.
   This allows a more precise feature value association than a global, population wide distribution.


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   bhepop2.sources.marginal_distributions.MarginalDistributions
   bhepop2.sources.marginal_distributions.QualitativeMarginalDistributions
   bhepop2.sources.marginal_distributions.QuantitativeMarginalDistributions


Attributes
~~~~~~~~~~

.. autoapisummary::

   bhepop2.sources.marginal_distributions.ALL_LABEL


.. py:data:: ALL_LABEL
   :value: 'all'

   
.. py:class:: MarginalDistributions(data, name=None, attribute_selection: list = None)


   Bases: :py:obj:`bhepop2.sources.base.EnrichmentSource`

   Abstract class describing marginal distributions source.

   In this class, the distributions subsets are known
   for population individuals presenting a specific attribute.
   For instance, the Filosofi data source (INSEE) stores distributions of
   declared income in administrative areas, for the whole population and for
   population subsets, such as tenants or owners.

   In this scope, we use the following terms to describe such marginal distributions:

   - An **attribute** refers to an information in the initial sample or in the aggregate data.
       For instance: age, profession, ownership, etc.
   - **Modalities** are the partition of one attribute.
       For instance, in Filosofi, the *ownership* attribute can take the values *Owner* and *Tenant*.
   - **Cross modalities** are the intersection of two or more modalities.
       For instance, *Owner* and *above 65 years old*.


   Then, population individuals are part of a single cross modality,
   and can be matched with distributions corresponding to their known attributes.

   .. py:method:: _validate_data()

      Validate the source data.

      Raise a ValueError if data is invalid.

      :raises: SourceValidationError


   .. py:method:: usable_with_population(population)

      Check that the population attributes are compatible with the source.

      Check that the source attributes are present in the population.
      Check that the population values of each attribute are in the source distributions.

      :param population: population DataFrame
      :raises: PopulationValidationError


   .. py:method:: _validate_data_type()
      :abstractmethod:


   .. py:method:: compute_feature_prob(attribute=ALL_LABEL, modality=ALL_LABEL)
      :abstractmethod:

      Return a DataFrame containing the probability to be in each feature state while in the given modality.

      The resulting DataFrame is of the following format:
      { "feature": [feature_values], "prob": [feature_probs] }

      This method accepts attributes and modalities from self.modalities and
      also (ALL_LABEL, ALL_LABEL) couple, returning the global distribution.

      :param attribute: attribute label
      :param modality: modality label

      :return: DataFrame["feature", "prob"]


   .. py:method:: get_modality_distribution(attribute, modality)

      Get the distribution corresponding to the given attribute and modality.

      This method accepts attributes and modalities from self.modalities and
      also (ALL_LABEL, ALL_LABEL) couple, returning the global distribution.

      :param attribute: attribute label
      :param modality: modality label
      :return:


.. py:class:: QualitativeMarginalDistributions(data, name=None, attribute_selection: list = None)


   Bases: :py:obj:`MarginalDistributions`

   Marginal distributions describing qualitative features.

   **Input data**:

   DataFrame with feature values as columns, and probabilities as
   column values, for each attribute/modality pair. An additional row containing
   a global distribution (for the whole population) must be present, with
   attribute and modality equal to :attr:`~bhepop2.sources.marginal_distributions.ALL_LABEL`.

   **Example**:

   .. list-table:: Table containing qualitative marginal distributions for attributes **ownership** and **age**
       :widths: 10 10 10 20 20
       :header-rows: 1

       * - Red
         - Green
         - Blue
         - attribute
         - modality
       * - 0.3
         - 0.3
         - 0.4
         - all
         - all
       * - 0.5
         - 0.2
         - 0.3
         - ownership
         - Owner
       * - 0.4
         - 0.4
         - 0.2
         - ownership
         - Tenant
       * - 0
         - 0.5
         - 0.5
         - age
         - 0_29
       * - ...
         - ...
         - ...
         - ...
         - ...
       * - 0.7
         - 0.1
         - 0.2
         - age
         - 75_or_more


   .. py:method:: _evaluate_feature_values()

      Evaluate the feature values from the distributions columns.

      :return: list of feature values


   .. py:method:: _validate_data_type()


   .. py:method:: compute_feature_prob(attribute=ALL_LABEL, modality=ALL_LABEL)

      Return a DataFrame containing the probability to be in each feature state while in the given modality.

      The resulting DataFrame is of the following format:
      { "feature": [feature_values], "prob": [feature_probs] }

      This method accepts attributes and modalities from self.modalities and
      also (ALL_LABEL, ALL_LABEL) couple, returning the global distribution.

      :param attribute: attribute label
      :param modality: modality label

      :return: DataFrame["feature", "prob"]


   .. py:method:: get_value_for_feature(feature_index, rng)

      Return a feature value for the given feature index.

      Generate a singular value from the feature state
      corresponding to the given index.

      :param feature_index: index of the feature in self.feature_values
      :param rng: Numpy random Generator

      :return: feature value


   .. py:method:: compare_with_populations(populations, feature_name, **kwargs)

      Compare the source data with populations containing the described feature (enriched or original)

      The class returns an instance of a PopulationAnalysis subclass, which
      can be used to generate different kinds of comparisons between the
      populations and the source data.

      :param populations: dict of populations {population_name: population}
      :param feature_name: population column containing the feature values
      :param kwargs: additional arguments for the analysis instance

      :return: PopulationAnalysis subclass instance.


.. py:class:: QuantitativeMarginalDistributions(data, name=None, attribute_selection: list = None, abs_minimum: int = 0, relative_maximum: float = 1.5, delta_min: int = None)


   Bases: :py:obj:`MarginalDistributions`, :py:obj:`bhepop2.sources.base.QuantitativeAttributes`

   Marginal distributions describing quantitative features.

   **Input data**:

   DataFrame with deciles numbers as columns (D1, D2 to D9),
   and values as column values, for each attribute/modality pair. An additional row containing
   a global distribution (for the whole population) must be present, with
   attribute and modality equal to :attr:`~bhepop2.sources.marginal_distributions.ALL_LABEL`.

   **Example**:

   .. list-table:: Table containing quantitative marginal distributions for attributes **ownership** and **age**
       :widths: 25 10 25 40 40
       :header-rows: 1

       * - D1
         - ...
         - D9
         - attribute
         - modality
       * - 18 852
         - ...
         - 46 522
         - all
         - all
       * - 16 542
         - ...
         - 50 060
         - ownership
         - Owner
       * - 8 764
         - ...
         - 29 860
         - ownership
         - Tenant
       * - 15 000
         - ...
         - 45 000
         - age
         - 0_29
       * - ...
         - ...
         - ...
         - ...
         - ...
       * - 20 000
         - ...
         - 65 000
         - age
         - 75_or_more


   .. py:method:: _evaluate_feature_values()

      Evaluate the feature values from the distribution values and class parameters.

      :return: list of feature values


   .. py:method:: _validate_data_type()


   .. py:method:: compute_feature_prob(attribute=ALL_LABEL, modality=ALL_LABEL)

      Return a DataFrame containing the probability to be in each feature state while in the given modality.

      The resulting DataFrame is of the following format:
      { "feature": [feature_values], "prob": [feature_probs] }

      This method accepts attributes and modalities from self.modalities and
      also (ALL_LABEL, ALL_LABEL) couple, returning the global distribution.

      :param attribute: attribute label
      :param modality: modality label

      :return: DataFrame["feature", "prob"]


   .. py:method:: get_value_for_feature(feature_index, rng)

      Return a value drawn from the interval corresponding to the feature index.

      The first interval is defined as [self._abs_minimum, self.feature_values[0]].
      and so on. The value is drawn using a uniform rule.

      :param feature_index:
      :param rng:
      :return:


   .. py:method:: compare_with_populations(populations, feature_name, **kwargs)

      Compare the source data with populations containing the described feature (enriched or original)

      The class returns an instance of a PopulationAnalysis subclass, which
      can be used to generate different kinds of comparisons between the
      populations and the source data.

      :param populations: dict of populations {population_name: population}
      :param feature_name: population column containing the feature values
      :param kwargs: additional arguments for the analysis instance

      :return: PopulationAnalysis subclass instance.