How to contribute

Below you will find general guidance on how to prepare your piece of code to be integrated to the resspect environment.

Add a new data set

The main challenge of adding a new data set is to build the infrastructure necessary to handle the new data.

The function below show how the basic structure required to deal with 1 light curve:

 1>>> import pandas as pd
 2
 3>>> def load_one_lightcurve(path_to_data, *args):
 4>>>     """Load 1 light curve at a time.
 5>>>
 6>>>     Parameters
 7>>>     ----------
 8>>>     path_to_data: str
 9>>>         Complete path to data file.
10>>>     ...
11>>>         ...
12>>>
13>>>     Returns
14>>>     -------
15>>>     pd.DataFrame
16>>>     """
17>>>
18>>>    ####################
19>>>    # Do something #####
20>>>    ####################
21>>>
22>>>    # structure of light curve
23>>>    lc = {}
24>>>    lc['dataset_name'] = XXXX               # name of the data set
25>>>    lc['filters'] = [X, Y, Z]               # list of filters
26>>>    lc['id'] = XXX                          # identification number
27>>>    lc['redshift'] = X                      # redshift (optional, important for building canonical)
28>>>    lc['sample'] = XXXXX                    # train, test or queryable (none is mandatory)
29>>>    lc['sntype'] = X                        # Ia or non-Ia
30>>>    lc['photometry' = pd.DataFrame()        # min keys: MJD, filter, FLUX, FLUXERR
31>>>                                            # bonus: MAG, MAGERR, SNR
32>>>    return lc

Feel free to also provide other keywords which might be important to handle your data. Given a function like this we should be capable of incorporating it into the pipeline.

Please refer to the resspect.fit_lightcurves module for a closer look at this part of the code.

Add a new feature extraction method

Currently resspect only deals with Bazin features. The snipet below show an example of friendly code for a new feature extraction method.

 1>>> def new_feature_extraction_method(time, flux, *args):
 2>>>    """Extract features from light curve.
 3>>>
 4>>>    Parameters
 5>>>    ----------
 6>>>    time: 1D - np.array
 7>>>        Time of observation.
 8>>>    flux: 1D - np.array of floats
 9>>>        Measured flux.
10>>>    ...
11>>>        ...
12>>>
13>>>    Returns
14>>>    -------
15>>>    set of features
16>>>    """
17>>>
18>>>         ################################
19>>>         ###   Do something    ##########
20>>>         ################################
21>>>
22>>>    return features

You can check the current feature extraction tools for the Bazin parametrization at resspect.bazin module.

Add a new classifier

A new classifier should be warp in a function such as:

 1>>> def new_classifier(train_features, train_labels, test_features, *args):
 2>>>     """Random Forest classifier.
 3>>>
 4>>>     Parameters
 5>>>     ----------
 6>>>     train_features: np.array
 7>>>         Training sample features.
 8>>>     train_labels: np.array
 9>>>         Training sample classes.
10>>>     test_features: np.array
11>>>         Test sample features.
12>>>     ...
13>>>         ...
14>>>
15>>>    Returns
16>>>     -------
17>>>     predictions: np.array
18>>>         Predicted classes - 1 class per object.
19>>>     probabilities: np.array
20>>>         Classification probability for all objects, [pIa, pnon-Ia].
21>>>     """
22>>>
23>>>    #######################################
24>>>    #######  Do something     #############
25>>>    #######################################
26>>>
27>>>    return predictions, probabilities

The only classifier implemented at this point is a Random Forest and can be found at the resspect.classifiers module.

Important

Remember that in order to be effective in the active learning frame work a classifier should not be heavy on the required computational resources and must be sensitive to small changes in the training sample. Otherwise the evolution will be difficult to tackle.

Add a new query strategy

A query strategy is a protocol which evaluates the current state of the machine learning model and makes an informed decision about which objects should be included in the training sample.

This is very general, and the function can receive as input any information regarding the physical properties of the test and/or target samples and current classification results.

A minimum structure for such function would be:

 1>>> def new_query_strategy(class_prob, test_ids, queryable_ids, batch, *args):
 2>>>     """New query strategy.
 3>>>
 4>>>     Parameters
 5>>>     ----------
 6>>>     class_prob: np.array
 7>>>         Classification probability. One value per class per object.
 8>>>     test_ids: np.array
 9>>>         Set of ids for objects in the test sample.
10>>>     queryable_ids: np.array
11>>>         Set of ids for objects available for querying.
12>>>     batch: int
13>>>         Number of objects to be chosen in each batch query.
14>>>     ...
15>>>         ...
16>>>
17>>>     Returns
18>>>     -------
19>>>     query_indx: list
20>>>         List of indexes identifying the objects from the test sample
21>>>         to be queried in decreasing order of importance.
22>>>     """
23>>>
24>>>        ############################################
25>>>        #####   Do something              ##########
26>>>        ############################################
27>>>
28>>>     return list of indexes of size batch

The current available strategies are Passive Learning (or Random Sampling) and Uncertainty Sampling. Both can be scrutinized at the :py:mod:resspect.`query_strategies` module.

Add a new diagnostic metric

Beyond the criteria for choosing an object to be queried one could also think about the possibility to test different metrics to evaluate the performance of the classifier at each learning loop.

A new diagnostic metrics can then be provided in the form:

 1>>> def new_metric(label_pred: list, label_true: list, ia_flag, *args):
 2>>>     """Calculate efficiency.
 3>>>
 4>>>     Parameters
 5>>>     ----------
 6>>>     label_pred: list
 7>>>         Predicted labels
 8>>>     label_true: list
 9>>>         True labels
10>>>     ia_flag: number, symbol
11>>>         Flag used to identify Ia objects.
12>>>     ...
13>>>         ...
14>>>
15>>>     Returns
16>>>     -------
17>>>     a number or set of numbers
18>>>         Tells us how good the fit was.
19>>>     """
20>>>
21>>>     ###########################################
22>>>     #####  Do something !    ##################
23>>>     ###########################################
24>>>
25>>>     return a number or set of numbers

The currently implemented diagnostic metrics are those used in the SNPCC (Kessler et al., 2009) and can be found at the resspect.metrics module.