# Parameter estimation for *PEtab/SBML* models using *parPE*


## Introduction

This document describes how to set up and use parPE to estimate parameters for
a model in the [PEtab](https://github.com/petab-dev/PEtab) format. This is
currently the most streamlined use case parPE.

[PEtab](https://github.com/petab-dev/PEtab) is a convention to specify systems
biology parameter estimation problems in a machine-readable way. It is based
on [SBML](http://sbml.org/) models and a set of tab-separated values files.
Further information and a detailed format description are provided in the PEtab
repository.

For testing, there is also an example shipped with parPE
[../examples/parpeamici/steadystate/](../examples/parpeamici/steadystate/)
and there is a collection of
[Benchmark Problems for Dynamic Modeling of Intracellular Processes](https://github.com/Benchmarking-Initiative/Benchmark-Models-PEtab) with a variety of example
problems.


## Workflow overview

A rough overview of the workflow for setting up and using parPE to optimize
a PEtab-based problem is provided in the following figure:

![PEtab/parPE optimization workflow overview](gfx/parpe_workflow_1.png)

The principal steps are:

1. Building parPE as described in the [documentation](../README.md)

2. Generating model C++ code and Python module (using `amici_import_petab`
   from [AMICI](https://github.com/ICB-DCM/AMICI/))

3. Setting up a project for building parameter estimation (and other)
   executables for the given model
   (using [../misc/setup_amici_model.sh](../misc/setup_amici_model.sh))

4. Generating an HDF5 input file for parPE with data and options for parameter
   estimation based on the PEtab problem definition (using
   `parpe_petab_to_hdf5` from the parPE Python package.

5. Running the desired optimization and further analysis

Any of these steps can be adapted to your specific needs. This document will
present the simplest use case.

NOTE: This workflow is to be simplified and converted to a configurable
[Snakemake-based](https://snakemake.readthedocs.io/en/stable/) workflow. A
scaffold is provided in [../snakemake/](../snakemake/).


## Notation

To not rely on a specific model, we will use generic artifact names throughout
this document. They will be written as some `${SOME_ARTIFACT}`, so that
you just copy any commands after setting the respective shell variable to your
required values.

We will refer to the following artifacts:

- `${PETAB_YAML_FILE}`: The PEtab YAML file (references all PEtab files
  belonging to the given parameter estimation problem)

- `${AMICI_MODEL_DIR}`: Output directory to be created where AMICI model code
  will be written to

- `${MODEL_NAME}`: Any name for the model

- `${PARPE_SOURCE_ROOT}`: Path to the parPE repository root directory

- `${PARPE_MODEL_DIR}`: Project directory for generating model-specific
  parameter estimation executables. Will be created, must not exist.

- `${ESTIMATE}`: Generated parameter estimation executable, see below

- `${H5_PE_INPUT}`: Generated HDF5 file for input to parameter estimation 
  executable, see below


## Building PEtab

Build parPE as described in the [documentation](../README.md).


## Model processing

Although generally any kind of model can be used with parPE after, we will only
describe the simplest case of using [AMICI](https://github.com/AMICI-dev/AMICI/)
models.
We assume that there is already a set of PEtab files with the problem definition.
(This is not strictly necessary for using parPE, but will require significant
additional effort).

We will use `amici_import_petab` for generating AMICI C++ model files and
Python package for the respective model. After installing AMICI, this
script should in your `$PATH` automatically.

*NOTE*: Use the AMICI version shipped with parPE (`deps/AMICI`). Do not try to
mix different versions of AMICI-generated models and AMICI base files. This
will likely lead to crashes and/or undefined behaviour.

Run:

```shell
amici_import_petab -v \
  -o ${AMICI_MODEL_DIR} \
  -n ${MODEL_NAME} \
  -y ${PETAB_YAML_FILE}
```

Which will create `${AMICI_MODEL_DIR}` containing model C++ files and a Python
package for the model.

Run `amici_import_petab -h` for further command line options.


## Building parameter estimation executable

Next we need to create a new project to build the executables for parameter
estimation. The `misc/setup_amici_model.sh` script will do that, using the
C++ code generated by AMICI. It will adapt some templates for `main.cpp` files
and will build the targets using CMake:

```shell
${PARPE_SOURCE_ROOT}/misc/setup_amici_model.sh ${AMICI_MODEL_DIR} ${PARPE_MODEL_DIR}
```

After that, among other files, there should now exist an executable
`${PARPE_MODEL_DIR}/build/estimate${MODEL_NAME}` which will be used in the
second next step.

To simplify notation:

```shell
export ESTIMATE=${PARPE_MODEL_DIR}/build/estimate${MODEL_NAME}
```

## Generating an HDF5 input file for parPE parameter optimization

The default workflow requires training data and optimization options to be
provided in an HDF5 file. Based on the PEtab problem definition, we can simply
create this using the `parpe_petab_to_hdf5` script from the parPE Python
package:

```shell
parpe_petab_to_hdf5 \
    -n ${MODEL_NAME} \
    -y ${PETAB_YAML_FILE} \
    -d ${AMICI_MODEL_DIR} \
    -o ${H5_PE_INPUT}
```

This should create `${H5_PE_INPUT}`. The file format is described in
[hdf5.md](hdf5.md) This file will contain some default
settings. Those can be adapted using [hdfview](https://www.hdfgroup.org/downloads/hdfview/), your programming language of
choice, or from the command line using
[../misc/optimizationOptions.py](../misc/optimizationOptions.py) (`-h` for
usage information).

To inspect the default settings, run:

```shell
${PARPE_SOURCE_ROOT}/misc/optimizationOptions.py ${H5_PE_INPUT}
```

## Running parameter optimization and further analyses

For running parameter estimation with default settings on a single node, run:

```shell
    ${ESTIMATE} -o test_output_dir/ ${H5_PE_INPUT}
```

Note that, depending on your model and data, this may take a long time.
The results will be written to HDF5 files in `test_output_dir/`. The output
format is described in [hdf5.md](hdf5.md).

Usage of the generated executable is described in more depth, for example, in
the Jupyter notebooks in
[../examples/parpeamici/steadystate/](../examples/parpeamici/steadystate/).
These notebooks also demonstrate the use of other executables created earlier
and show examples for data analysis using the parPE Python package.