# HDF5 data file structure ## Parameter estimation input files ### Overview Default input files for parameter estimation are expected to follow this structure: ```text + /amiciOptions/ + /optimizationOptions/ + /fixedParameters/ + /measurements/ + /parameters/ ``` ### /amiciOptions/ ```text + /amiciOptions/ \ - (Attributes specifying AMICI settings, see deps/AMICI/src/hdf5.cpp) - sens_ind List of model parameter indices w.r.t. which to compute sensitivities - ts Unique list of time-points for which there is data. Determines the default timepoints set to the AMICI model. ``` ### /fixedParameters/ ```text + /fixedParameters/ \ - conditionNames Any labels for columns in `/fixedParameters/k`. [string: nConditionVectors] - k [double: nk * nConditionVectors] Fixed parameter vectors for simulation or preequilibration. See `/fixedParameters/simulationConditions` - parameterNames IDs of AMICI model fixed parameters - simulationConditions List of (fixed_par_preequilibration, fixed_par_simulation) tuples, referring to columns in `k`. Number of rows determines number of model simulations per cost function evaluation. For no preequilibration, the respective values should be -1. ``` ### /measurements/ ```text + /measurements/ \ - observableNames [string: ny] observable IDs as provided by the AMICI model Measurement-related data, each group contains datasets named 0..nSimulationConditions-1 - t/ [double n_t] Vector with timepoints for the respective rows in `y/` and `ysigma/`. Ascending order, maybe non-unique to specify replicates. - y/ - ysigma/ [double n_t x n_y] Measured values and standard deviation for each observable and timepoint. If sigma is given by a model parameter, entry must be NaN ``` ### /parameters/ ```text + /parameters/ \ - lowerBound - upperBound [double n_opt_par] Bounds for optimization parameters. Same length and order as modelParameterNames. Given in scale `pscale`. - modelParameterNames [string n_opt_par] AMICI model parameter IDs - nominalValues [double n_opt_par] Some specified parameter vector for testing - optimizationSimulationMapping [int np * n_simulation_conditions] Mapping of optimization parameters to model parameters for a given condition. -1 for no mapping, in which case the respective value from `parameterOverrides` will be used. - parameterOverrides [int np * n_simulation_conditions] Constant condition-specific parameter overrides (see `optimizationSimulationMapping`) - parameterNames [string n_opt_par] Optimization parameter names - pscaleOptimization [int n_opt_par] Parameter scaling for optimization parameters - pscaleSimulation [int n_simulation_conditions * n_sim_par] Parameter scale for each of `np` model parameters for the respective condition (amici::ParameterScaling) ``` ### Optimization options ```text + /optimizationOptions/ \ - Attributes: - maxIter [int] - hierarchicalOptimization [int] - numStarts [int] - optimizer [int] - randomStarts [double np_opt x n_starts] Groups with attributes for optimizer-specific settings: - ceres/ - fmincon/ - ipopt/ - toms611/ See the respective source files in parpeoptimization/src/ for details. ``` ### Hierarchical optimization Those datasets are only required for use with hierarchical optimization. If any dataset would be empty, it can be omitted. ```text /offsetParametersMapToObservables /offsetParameterIndices /scalingParametersMapToObservables /scalingParameterIndices /sigmaParametersMapToObservables /sigmaParameterIndices ``` #### (offset|scaling|sigma)ParameterIndices 1D integer array of indices to `/parameters/parameterNames`, indicating which parameters should be considered for hierarchical optimization. In ascending order. #### (offset|scaling|sigma)ParametersMapToObservables Mapping of measurements to analytically computed parameters. Integer matrix with three columns: | parameterIdx | conditionIdx | observableIdx | |---|------|---| |...|...|...| `parameterIdx` is an index to `/(offset|scaling|sigma)ParameterIndices` `conditionIdx` is an index to a row in `/fixedParameters/simulationConditions` `observableIdx` is an index to `/measurements/observableNames` Rows may occur in arbitrary order. Different parameters for different timepoints are currently not supported. A work-around would be introducing an additional observable. ## Parameter estimation output files This section explains the output format generated by the parameter estimation executables generated from `../templates/main.cpp`. ```text + /inputData/ - Just a copy of the input data for parameter estimation, see above + /multistarts/ - Contains groups for each optimizer run names 0..numStarts-1 - /totalTimeInSec - The total cpu time for parameter estimation ``` ### Results per optimizer run ```text + /multistarts/$n/ + iteration/ - Contains one group named by 0-based iteration index, containing: - costFunCallIndex: [int 1 x nCostFunEval] Ascending number for cost function calls across all iterations - costFunCost: [double 1 x nCostFunEval] Cost function values for all cost function calls during this iteration - costFunGradient: [double nOptimizationParameters x nCostFunEval] Cost function gradient values for all cost function calls during this iteration. NaN if not gradient was requested. - costFunParameters: [double nOptimizationParameters x nCostFunEval] Cost function parameters for all cost function calls during this iteration. - costFunWallTimeInSec: [double 1 x nCostFunEval] Wall time for cost function evaluation on master in seconds. - jobId [int 1 x nIterations*nConditions] Running number. Internal simulation job id. For debugging and mapping data from worker result files. - simulationLabel [string 1 x nIterations*nConditions] Human readable label corresponding to jobId, constisting of optimization index 'o', iteration index 'i', and condition index 'c' - simulationLogLikelihood [double 1 x nIterations*nConditions] Log-likelihood for all simulation jobs - simulationLogLikelihoodGradient [double nOptimizationParameters x nIterations*nConditions] Log-likelihood gradient for all simulation jobs - simulationObservables [double nObservables*nTimepoints x nIterations*nConditions] Model outputs for all simulations - simulationParameters [double nOptimizationParameters x nIterations*nConditions] Parameter for all simulations - simulationStates [double nObservables*nTimepoints x nIterations*nConditions] Model states for all simulations - simulationStatus [int 1 x nIterations*nConditions] AMICI status for all simulations - simulationWallTimeInSec [double 1 x nIterations*nConditions] Wall-time in seconds for all simulations - wallSec [double] Walltime for full optimization on master. ``` ## Simulation result files This section explains the output format generated by the simulator executables generated from `../templates/main_simulate.cpp` with `--at-optimum`. ```text + /multistarts/$n/ With $n := 0..numStarts containing simulation results for the respective optimizer run. - llh [double nConditions] Log-likelihood for each condition - parameters [double nOptimizationParameters] Optimization problem parameters used to obtain the simulations + t/$m/ [double nTimepoints] Timepoints for condition $m + yMes/$m/ [double nTimepoints x nObservables] Training data for condition $m + ySim/$m/ [double nTimepoints x nObservables] Model outputs for condition $m ```