:py:mod:`bhepop2.analysis`
==========================

.. py:module:: bhepop2.analysis

.. autoapi-nested-parse::

   This module provides tools to analyse populations.

   Most of the time, population analysis is done by comparing
   it with reference data.

   For enriched populations, comparison with the enrichment source data
   can be a good way to assert the quality of the enrichment.


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   bhepop2.analysis.PopulationAnalysis
   bhepop2.analysis.QuantitativeAnalysis
   bhepop2.analysis.QualitativeAnalysis


.. py:class:: PopulationAnalysis(populations: dict, modalities: dict, feature_column: str, distributions: pandas.DataFrame, distributions_name: str = DEFAULT_SOURCE_NAME, plot_title_format: str = DEFAULT_PLOT_TITLE_FORMAT, output_folder: str = None)


   DISCLAIMER: This class only works with MarginalDistributions data.

   The PopulationAnalysis class and its subclasses were implemented before the
   refactoring of the enrichment classes, which led to the composition
   of SyntheticPopulationEnrichment with EnrichmentSource, which is more generic.
   Therefore, this class expects distributions as in MarginalDistributions.data
   rather than a generic enrichment source data.

   ---------

   Analysis class for synthetic populations.

   Synthetic populations must be identical except for their feature columns.

   The values of the feature columns and their distributions are compared between populations
   and to the reference distribution.

   Analysis is realised on the given modalities, which must be a subset of the modalities used for enrichment
   (and thus available in the population(s) and distributions).

   The following analysis are available:
       - Graphs comparing the distributions in the population(s) to the original distributions (one per modality)
       - A table describing the error of the population(s) in comparison to the distributions (one line per modality), ordered by number of individuals in the modality


   .. py:property:: analysis_table


   .. py:attribute:: CLASS_COLUMN

      
   .. py:attribute:: VALUE_COLUMN
      :value: 'value'

      
   .. py:attribute:: DEFAULT_PLOT_TITLE_FORMAT
      :value: 'Modality {modality} from attribute {attribute}'

      
   .. py:method:: set_output_folder(output_folder)

      Set a new output folder for this analysis instance.

      :param output_folder: valid output folder path


   .. py:method:: assert_output_folder()

      Check that the output folder is set.

      :raises: AssertionError


   .. py:method:: _evaluate_analysis_table()

      Create a table used for comparing populations/distributions.

      The resulting DataFrame contains the following columns:
          - attribute: attribute name
          - modality: modality name
          - self.PROPORTION_COLUMN: value describing the proportion taken for the corresponding
          - one column with the observed_name value
          + one column for each population name

      :return: analysis DataFrame


   .. py:method:: _format_distributions_for_analysis()
      :abstractmethod:

      Format the distributions table for as an analysis table.

      :return: distributions as an analysis table


   .. py:method:: _compute_distributions_by_attribute(population: pandas.DataFrame) -> pandas.DataFrame

      Compute the feature values distribution for each modality.

      Generate an analysis table for this population.

      :param population: population DataFrame

      :return: analysis table


   .. py:method:: _compute_distribution(population: pandas.DataFrame)
      :abstractmethod:

      Get distribution of the feature values in the population.

      :param population: population DataFrame

      :return: analysis table of the population


   .. py:method:: generate_analysis_plots()

      Generate plots comparing the population(s) to the original distributions (one per modality).

      Plots are exported to PNG images in the output folder.


   .. py:method:: plot_analysis_compare(attribute: str, modality: str)
      :abstractmethod:

      Generate a plot comparing the populations and the distributions, for the given attribute and modality.

      :param attribute: attribute value
      :param modality: attribute modality

      :return: Plotly Figure


   .. py:method:: generate_analysis_error_table(export_csv: bool = True)
      :abstractmethod:

      Generate a table describing how analysed populations deviate from the original distributions.

      :param export_csv:
      :return:


   .. py:method:: get_plot_title(**kwargs) -> str

      Get the plot title for the given keys.

      This on the `plot_title_format` attribute, which can be
      set externally.

      :param kwargs: keys provided to the plot_title_format string

      :return: plot title


.. py:class:: QuantitativeAnalysis(populations: dict, modalities: dict, feature_column: str, distributions: pandas.DataFrame, distributions_name: str = DEFAULT_SOURCE_NAME, plot_title_format: str = DEFAULT_PLOT_TITLE_FORMAT, output_folder: str = None)


   Bases: :py:obj:`PopulationAnalysis`

   DISCLAIMER: This class only works with MarginalDistributions data.

   The PopulationAnalysis class and its subclasses were implemented before the
   refactoring of the enrichment classes, which led to the composition
   of SyntheticPopulationEnrichment with EnrichmentSource, which is more generic.
   Therefore, this class expects distributions as in MarginalDistributions.data
   rather than a generic enrichment source data.

   ---------

   Analysis class for synthetic populations.

   Synthetic populations must be identical except for their feature columns.

   The values of the feature columns and their distributions are compared between populations
   and to the reference distribution.

   Analysis is realised on the given modalities, which must be a subset of the modalities used for enrichment
   (and thus available in the population(s) and distributions).

   The following analysis are available:
       - Graphs comparing the distributions in the population(s) to the original distributions (one per modality)
       - A table describing the error of the population(s) in comparison to the distributions (one line per modality), ordered by number of individuals in the modality


   .. py:attribute:: CLASS_COLUMN
      :value: 'decile'

      
   .. py:method:: plot_analysis_compare(attribute: str, modality: str)

      Comparison plot between reference data and simulation

      :param attribute:
      :param modality:

      :return: Plotly figure


   .. py:method:: generate_analysis_error_table(export_csv=True)

      Generate a table describing how analysed populations deviate from the original distributions.

      :param export_csv: boolean a csv export should be realised

      :return: error table DataFrame


   .. py:method:: _format_distributions_for_analysis()

      Format the distributions table for as an analysis table.

      :return: distributions as an analysis table


   .. py:method:: _compute_distribution(population: pandas.DataFrame) -> pandas.DataFrame

      Compute decile distribution of the feature values.

      :param population: analysed population

      :return: dataframe of deciles


.. py:class:: QualitativeAnalysis(populations: dict, modalities: dict, feature_column: str, distributions: pandas.DataFrame, distributions_name: str = DEFAULT_SOURCE_NAME, plot_title_format: str = DEFAULT_PLOT_TITLE_FORMAT, output_folder: str = None)


   Bases: :py:obj:`PopulationAnalysis`

   DISCLAIMER: This class only works with MarginalDistributions data.

   The PopulationAnalysis class and its subclasses were implemented before the
   refactoring of the enrichment classes, which led to the composition
   of SyntheticPopulationEnrichment with EnrichmentSource, which is more generic.
   Therefore, this class expects distributions as in MarginalDistributions.data
   rather than a generic enrichment source data.

   ---------

   Analysis class for synthetic populations.

   Synthetic populations must be identical except for their feature columns.

   The values of the feature columns and their distributions are compared between populations
   and to the reference distribution.

   Analysis is realised on the given modalities, which must be a subset of the modalities used for enrichment
   (and thus available in the population(s) and distributions).

   The following analysis are available:
       - Graphs comparing the distributions in the population(s) to the original distributions (one per modality)
       - A table describing the error of the population(s) in comparison to the distributions (one line per modality), ordered by number of individuals in the modality


   .. py:attribute:: CLASS_COLUMN
      :value: 'feature'

      
   .. py:method:: plot_analysis_compare(attribute: str, modality: str)

      Comparison plot between reference data and simulation

      :param attribute:
      :param modality:

      :return: Plotly figure


   .. py:method:: _format_distributions_for_analysis()

      Format the distributions table for as an analysis table.

      :return: distributions as an analysis table


   .. py:method:: _compute_distribution(population: pandas.DataFrame) -> pandas.DataFrame

      Get distribution of the feature values in the population.

      :param population: population DataFrame

      :return: analysis table of the population