
Simpact Cyan - 0.21.0¶
This document is the reference documentation for the Simpact Cyan
program, the C++ version of the Simpact
family of programs.
The program is most easily controlled through either the Python or
R bindings, using the pysimpactcyan
module and RSimpactCyan
package respectively; the way to use them is described below.
The Python bindings are included in the Simpact Cyan package, the R
bindings need to be installed separately from within R.
Apart from this document, there exists some additional documentation as well:
- Documentation of the program code itself can be found here: code documentation.
- A tutorial for using the R bindings: Age-mixing tutorial Bruges 2015
- R versions of the iPython notebook examples: available on GitHub
The development source code can be found on GitHub. The Simpact Cyan installers, as well as source code packages for the program, can be found in the ‘programs’. directory. If you’re using MS-Windows, you’ll need to install the Visual Studio 2015 redistributable package as well (use the x86 version) to be able to use the installed version.
In case you’re interested in running simulations from R,
you’ll need to have a working
Python version installed as well. For MS-Windows this is typically not
installed by default; when installing this it’s best to use the default
directory (e.g. C:\Python27
or C:\Python34
).
MaxART documentation¶
The configuration for running MaxART specific simulations is very similar to the core Simpact Cyan features. The things that are different are described in the MaxART documentation.
Documentation contents¶
Introduction¶
Like other members of the Simpact family, Simpact Cyan is an agent based model (ABM) to study the way an infection spreads and can be influenced, and is currently focused on HIV. The program models each individual in a population of a specified initial size. Things that can happen are represented by events, which can have a certain risk of being executed at a certain moment. Time advances by iteratively finding the next event that will take place, updating the time to the corresponding value and executing event-specific code. The way this is implemented, is using the Modified Next Reaction Method [Anderson].
Note that event times and time-related properties that can be configured are expressed in units of years, unless stated otherwise.
Modified Next Reaction Method (mNRM) & hazards¶
In the mNRM algorithm, there is a core distinction between internal event times and real-world event times. The internal event times determine when an event will go off according to some notion of an internal clock. Let’s call the internal time interval until a specific event fires \(\Delta T\). Such internal time intervals are chosen using a simple method, typically as random numbers picked from an exponential distribution:
Events in the simulation will not just use such internal times, they need to be executed at a certain real-world time. Calling \(\Delta t\) the real-world time interval that corresponds to the internal time interval \(\Delta T\), this mapping is done using the notion of a hazard (called propensity function in the mNRM article) \(h\):
It is this hazard that can depend on the state \(X\) of the simulation, and perhaps also explicitly on time. The state of the simulation in our case, can be thought of as the population: who has relationships with whom, who is infected, how many people are there etc. This state \(X(t)\) does not continuously depend on time: the state only changes when events get fired, which is when their internal time interval passes. Note that the formula above is for a single event, and while \(\Delta T\) itself is not affected by other events, the mapping into \(\Delta t\) certainly can be: other events can change the state, and the hazard depends on this state.
The figure below illustrates the main idea: internal time intervals are chosen from a certain distribution, and they get mapped onto real-world time intervals through hazards. Because hazards can depend both on the state (so the time until an event fires can be influenced by other events that manipulate this state), and can have an explicit time dependency, this mapping can be quite complex:

The hazard can cause complex behaviour, but of course this is not necessarily the case. If one uses a constant hazard, this simply causes a scaling between internal time \(\Delta T\) and real-world time \(\Delta t\):
This also illustrates that the higher the hazard, the earlier the event will fire, i.e. the real-world time interval will be smaller.
As an example, let’s consider formation events. At a certain time in the simulation, many formation events will be scheduled, one event for each man/woman pair that can possibly form a relationship. The internal time interval for each of these events will simply be picked from the simple exponential distribution that was mentioned above. The mapping to a real-world time at which the event will fire, is calculated using the hazard-based method, and this hazard depends on many things (the state): how many relationships does the man have at a certain time, how many relationships does the woman have, what is the preferred age difference etc. One can also imagine that there can be an explicit time dependency in the hazard: perhaps the hazard of forming a relationship increases if the time since the relationship became possible goes up.
Using an exponential distribution to generate an internal time interval is how the method is described in the [Anderson] article. It is of course not absolutely necessary to do this, and other ways to generate an internal time are used as well. The simplest example, is if one wants to have an event that fires at a specific time. In that case, \(\Delta T\) can simply be set to the actual real-world time until the event needs to fire, and the hazard can be set to \(h=1\), so that internal and real-world time intervals match. Among others, this is done in the HIV seeding event which, when triggered, starts the epidemic by marking a certain amount of people as infected.
Population based simulation¶
Each time an event is triggered, the state of the simulation is allowed to change. Because the hazard of any event can depend on this state, in the most general version of the mNRM algorithm, one would recalculate the real-world event fire times of all remaining events each time a particular event gets triggered. This ensures that the possibly changed state is taken into account. Recalculating all event fire times all the time, is of course very inefficient: although the state may have been changed somewhat, this change may not be relevant for many of the event hazards in use. As a result, the calculated real-world fire times would be mostly the same as before.
In the Simpact model, the state can be thought of as the population that is being simulated, where the population consists of persons. Each person is linked to a list of events that involve him or her, and if an event is relevant to more than one person it will be present in the lists of more than one person. For example, a mortality event would be present in the list of only one person, while a relationship formation event is about two distinct people and would therefore be present in two such lists. The figure below illustrates this idea:

When an event fires, it is assumed that only the properties of a very limited set of people
have changed, and that one only needs to recalculate the fire times of the events in those
people’s lists. For example, if Event 2
from the figure above fires, then the real-world
fire times for the events in the lists of Person A
and Person B
will be automatically
recalculated. Apart from affecting the people in whose lists an event appears, an event can
also indicate that other people are affected. As an example, a birth event will only
appear in the list of the woman who’s pregnant. However, when triggered this event indicates
that the father is also an affected person (in case the amount of children someone has is used
in a hazard). In general, this number of other affected people will be very small compared to
the size of the population, causing only a fraction of the event fire times to be recalculated.
This allows this population-based algorithm to run much faster than the very basic algorithm
that always recalculates all event times.
Besides these types of events, there are also ‘global’ events. These events do not refer to a particular person and will modify the state in a very general way. In general, when such a global event is triggered, this causes all other event fire times to be recalculated.
‘Time limited’ hazards¶
In the mNRM algorithm, time advances in steps, from one event fire time to the next. In general, these event fire times are calculated by mapping a generated internal time interval \(\Delta T\) onto a real-world time interval \(\Delta t\) using the formula
where \(h\) is the hazard that can have an explicit time dependency and a dependency on the simulation state. While the simulation state can change over time, it can only change at discrete points, when other events change the state.
The form of the hazard determines how fast this mapping between internal times and real-world times can be calculated. To keep the simulation as fast as possible, hazards for which the integral has an analytic solution are certainly most interesting. Furthermore, because the mapping between internal and real-world times needs to be done in each direction, the resulting equation for \(\Delta T\) needs to be invertible as well.
The hazards that we use in the Simpact events are often of the form
This is a time dependent hazard where \(A\) and \(B\) are determined by other values in the simulation state. The nice feature of such a hazard is that it is always positive, as a hazard should be (otherwise the mapping could result in simulation time going backwards). Unfortunately, this form also has a flaw: consider the example where \(A = 0\), \(B = -1\) and \(t_{\rm prev} = 0\) for conciseness. The mapping between times then becomes
When we need to map a specific \(\Delta t\) onto an internal \(\Delta T\), this expression can be used to do this very efficiently. When we need the reverse, rewriting this equation gives:
From this it is clear that it is only possible if \(\Delta T\) is smaller than one, which may not be the case since \(\Delta T\) is picked from an exponential probability distribution in general. The core problem is that the integral in our expression is bounded, suggesting an upper limit on \(\Delta T\), but on the other hand that \(\Delta T\) needs to be able to have any positive value since it is picked from an exponential distribution which does not have an upper limit.
To work around this, we use a slightly different hazard, one that becomes constant after a certain time \(t_{\rm max}\), as is illustrated in the figure below. This has the effect that the integral will no longer have an upper bound, and the mapping from \(\Delta T\) to \(\Delta t\) will always be possible.

We are calculating a different hazard than before of course, so you may wonder whether this is really a good idea. In this respect, it is important to note that we’re simulating individuals that will not live forever, but have a limited lifespan. So if we set \(t_{\rm max}\) to the time at which the relevant person would be 200 years old (for example), we can be very sure that our choice for \(t_{\rm max}\) will not affect the simulation. It only helps to keep the calculations feasible.
Above, the basic problem and solution are illustrated using a simple time dependent exponential hazard, but it should be clear that the problem occurs for other hazards as well: one only needs a hazard for which the integral above is bounded, and since choosing \(\Delta T\) from an exponential probability distribution can yield any value, problems will occur. The solution in the general case is the same: set the hazard to a constant value after a \(t_{\rm max}\) value which exceeds the lifetime of a person. The detailed calculations for this procedure can be found in this document: hazard_tmax.pdf.
Configuration and running of a simulation¶
The basic Simpact Cyan program is a standalone command-line program. To be able to
set up a simulation, you’d need to prepare a configuration file and specify this
file as a command-line argument. Preparing the configuration file manually is
time-consuming work, as all event properties necessary in a simulation need to
be set.
To make it easier to prepare and run simulations, there’s a pysimpactcyan
module
that you can use to control Simpact Cyan from Python, or alternatively there’s
a RSimpactCyan
library that you can install in R that provides a similar
interface. The Python module is included when you install the Simpact Cyan binaries,
the R library must be installed separately from an R session.
You can also use a combined approach: first run a simulation or simply prepare a configuration file from R or Python, and subsequently use this configuration to start one or more simulations. This can be very helpful to first prepare a base configuration file in an easy way, and then to launch one or more simulations on a computer cluster for example. For this particular case, it can be very helpful to override e.g. a prefix on the output files as explained below.
In this section, we briefly look into starting the simpact cyan program on the command line, followed by explanations of how the Python interface or R interface works. Some insights into the configuration file are given next, since that is the actual input to the simulation. Typically, if you can specify a particular probability distribution in the configuration file, you can also specify others. At the end of this section we describe which distributions are supported and what their parameters are.
Running from command line¶
The Simpact Cyan executable that you’re likely to need is called simpact-cyan-release
.
There is a version with debugging information as well: this performs exactly the same calculations, but
has more checks enabled. As a command line argument you can specify if the very basic
mNRM (in which all event times are recalculated after triggering an event) is to be
used, or the more advanced version (in which the program recalculates far less event
fire times). While simpact-cyan-release
with the advanced mNRM algorithm is the
fastest, it may be a good idea to verify from time to time that the simple algorithm
yields the same results (when using the same random number generator seed), as well
as the debug versions.
The program needs three additional arguments, the first of which is the path to the configuration file that specifies what should be simulated. The configuration file is just a text file containing a list of key/value pairs, a part of which could look like this:
...
population.nummen = 200
population.numwomen = 200
population.simtime = 40
...
You can also define variables and use environment variables of your system, later we’ll look into this config file in more detail. All configuration options and their possible values are described in the section with the simulation details. For the configuration file itself, all options that are needed in the simulation must be set, no default values are assumed by the command line program. When using the R or Python interface however, this helper system does know about default values thereby severely limiting the number of options that must be specified. It will combine the options you specify with the defaults to produce a full configuration file that the command line program can use.
The second argument that the simpact-cyan-release
program needs is either 0 or 1 and
specifies whether the single core version should be used (corresponds to 0
),
or the version using multiple cores (corresponds to 1
). For the parallel version,
i.e. the version using multiple cores, OpenMP
is used as the underlying technology. By default the program will try to use all
processor cores your system has, but this can be adjusted by setting the
OMP_NUM_THREADS
environment variable.
In general, it is a good idea to specify 0
for this option, selecting the single-core
version. The parallel version currently only offers a modest speedup, and only for very
large population sizes. Especially if you need to do several runs of a simulation, starting
several single-core versions at once will use your computer’s power more efficiently
than starting several parallel versions in a sequential way.
With the third and final argument you can specify which mNRM algorithm to use: if you specify ‘simple’, the basic mNRM is used in which all event fire times will be recalculated after an event was triggered. Since this is a slow algorithm, you’ll probably want to specify ‘opt’ here, to use the more advanced algorithm. In this case, the procedure explained above is used, where each user stores a list of relevant events.
So, assuming we’ve created a configuration file called myconfig.txt
that resides in
the current directory, we could run the corresponding simulation with the following
command:
simpact-cyan-release myconfig.txt 0 opt
This will produce some output on screen, such as which version of the program is
being used and which random number generator seed was set. Since the random number
generator seed is in there, it may be a good idea to save this to a file in case
you’d like to reproduce the exact simulation later. To save it to a file called
myoutput.txt
, you can run
simpact-cyan-release myconfig.txt 0 opt 2>myoutput.txt
Note that it is not a redirection of the output using simply >
, but using 2>
.
This has to do with the fact that the information that you see on screen is actually
sent to stderr instead of
stdout
.
When running the Simpact Cyan program, the default behaviour is to initialize the
random number generator with a (more or less) random seed value. For reproducibility
it may be necessary to enforce a specific seed. To do so, set the environment variable
MNRM_DEBUG_SEED
to the value you want to use, and verify in the output of the
program that the specified seed is in fact the one being used:
for an MS-Windows system:
set MNRM_DEBUG_SEED=12345 simpact-cyan-release myconfig.txt 0 optNote that value of
MNRM_DEBUG_SEED
is still set, which is important when running additional simulations. To clear it, either exit the current command prompt, or executeset MNRM_DEBUG_SEED=(nothing may be specified after the
=
sign, not even a space)for a Linux or OS X system:
export MNRM_DEBUG_SEED=12345 simpact-cyan-release myconfig.txt 0 optNote that value of
MNRM_DEBUG_SEED
is still set, which is important when running additional simulations. To clear it, either exit the current terminal window, or executeunset MNRM_DEBUG_SEEDOn one of these operating systems, it is also possible to specify everything in one line:
MNRM_DEBUG_SEED=12345 simpact-cyan-release myconfig.txt 0 optIn this case, the value of
MNRM_DEBUG_SEED
will be visible to the program, but will no longer be set once the program finishes. It will therefore not affect other programs that are started.
Running from within R¶
Getting started¶
The R interface to Simpact Cyan will underneath still execute one of the
Simpact Cyan programs, e.g. simpact-cyan-release
, so the program
relevant to your operating system must be installed first. Note that if
you’re using MS-Windows, you’ll also need to install the
Visual Studio 2015 redistributable package
(use the x86 version).
The R module actually contains Python code so to be able to use this, you’ll
need to have a working Python installation. On Linux or OS X, this is usually
already available, but if you’re using MS-Windows you may need to install this
separately. In this case, it is best to install it in the default directory,
e.g. C:\Python27
or C:\Python34
, so that the R package will be able to
locate it easily.
Before being able to use the RSimpactCyan
module, the library which
contains the R interface to the Simpact Cyan program, you need to make sure that other
libraries are available. The most straightforward way is to run
source("https://raw.githubusercontent.com/j0r1/RSimpactCyanBootstrap/master/initsimpact.R")
which runs a script from RSimpactCyanBootstrap that downloads the required packages.
If you prefer not to run a script this way you can also add, either temporarily
in your current R session or more permanently in your .Rprofile
file, the
following lines which add the package repository containing the Simpact Cyan library:
local({ x <- options()$repos
if (!is.element("CRAN", x)) {
x["CRAN"] = "@CRAN@"
}
x["SimpactCyan"] <- "http://research.edm.uhasselt.be/jori"
options(repos = x) })
Then, you simply have to run
install.packages("RSimpactCyan")
and packages on which RSimpactCyan
depends will be downloaded and installed
automatically.
Without modifiying the list of repositories, you can also install the dependencies first
manually, followed by the RSimpactCyan
library:
install.packages("RJSONIO")
install.packages("findpython")
install.packages("rPithon", repos="http://research.edm.uhasselt.be/jori")
install.packages("RSimpactCyan", repos="http://research.edm.uhasselt.be/jori")
Finally, you can load the library with the command:
library("RSimpactCyan")
Running a simulation¶
To configure a simulation, you need to specify the options for which you want to use a value other than the default. This is done using a list, for example
cfg <- list()
cfg["population.nummen"] <- 200
cfg["population.numwomen"] <- 200
cfg["population.simtime"] <- 40
All values that are entered this way are converted to character strings when creating a configuration file for the simulation. This means that instead of a numeric value, you could also use a string that corresponds to the same number, for example
cfg["population.nummen"] <- "200"
Together with the defaults for other options, these settings will be combined into a configuration file that the real Simpact Cyan program can understand. Taking a look at the full configuration file will show you what other values are in use; to see this configuration file, run
simpact.showconfig(cfg)
Lines that start with a #
sign are ignored when the configuration file is read.
They may contain comments about certain options, or show which options are not
in use currently. In case you’d want to use a simulation using all defaults, you can either use an
empty list, or specify NULL
.
If you’ve got the configuration you’d like to use,
you can start the simulation from within R with the command simpact.run
. Two
parameters must be specified: the first is the settings to use (the cfg
list
in our example) and the second is a directory where generated files and results
can be stored. The R module will attempt to create this directory if it does not
exist yet. To use the directory /tmp/simpacttest
, the command would become
res <- simpact.run(cfg, "/tmp/simpacttest")
The other parameters are:
agedist
: With this parameter, you can specify the age distribution that should be used when generating an initial population. The default is the age distribution of South Africa from 2003. In R, you can specify an alternative age distribution in two ways.The first way to do this, is to specify the age distribution as an R data frame or list, which contains columns named
Age
,Percent.Male
andPercent.Female
. TheAge
column should be increasing, and the other columns specify the probability of selecting each gender between the corresponding age and the next. Before the first specified age, this probability is zero, and the last mentioned age should have zeroes as the corresponding probabilities. The term probability here is not strictly correct: it can be any positive number since the resulting distribution will be normed. As an examplead <- list(Age=c(0,50,100), Percent.Male=c(1,2,0), Percent.Female=c(2,1,0))will correspond to an age distribution which limits the age to 100 for everyone. Furthermore, there will be twice as many men over 50 than under 50, while for the women it’s the other way around.
The other way an age distribution can be specified, is as a CSV file with (at least) three columns. The header of this CSV file will not be taken into account, instead the first column is assumed to hold the
Age
column, the second is interpreted as thePercent.Male
column and the third asPercent.Female
.
intervention
: With this simulation intervention setting it is possible to change configuration options that are being used at specific times during the simulation. More information about how this can be used can be found in the explanation of the simulation intervention event.
release
,slowalg
,parallel
: These flags specify which precise version of the simulation program will be used, and whether the single-core or multi-core version is used. Therelease
parameter isTRUE
by default, yielding the fastest version of the selected algorithm. If set toFALSE
, many extra checks are performed, all of which should pass if the algorithm works as expected.By default,
slowalg
isFALSE
which selects the population-based procedure described above. In case this is set toTRUE
, the very basic mNRM algorithm is used, where all event fire times are recalculated after each event is executed. If all works as expected, the two algorithms should produce the same results for the same seed (although very small differences are possible due to limited numeric precision). The basic algorithm is very slow, keep this in mind if you use it.The
parallel
parameter isFALSE
by default, selecting the version of the algorithm that only uses a single processor core. To use the parallel version, i.e. to use several processor cores at the same time, this can be set toTRUE
. The parallel version currently only offers a modest speedup, and only for very large population sizes. Especially if you need to do several runs of a simulation, starting several single-core versions at once will use your computer’s power more efficiently than starting several parallel versions in a sequential way.
seed
: By default, a more or less random seed value will be used to initialize the random number generator that’s being using in the simulation. In case you’d like to use a specific value for the seed, for example to reproduce results found earlier, you can set it here.
dryrun
: If this is set toTRUE
, the necessary configuration files will be generated, but the actual simulation is not performed. This can come in handy to prepare a simulation’s settings on your local machine and run one or more actual simulations on another machine, e.g. on a computer cluster. In case you’d like to perform several runs with the same configuration file, overriding the output prefix can be very helpful, as is described in the section on the configuration file. If you’d like to perform a run that has been prepared this way from within R, you can use thesimpact.run.direct
function.
identifierFormat
: Files that are created by the simulation will all start with the same identifier. The identifierFormat parameter specifies what this identifier should be. Special properties start with a percent (%
) sign, other things are just copied. An overview of these special properties:
%T
: will expand to the simulation type, e.g.simpact-cyan
%y
: the current year%m
: the current month (number)%d
: the current day of the month%H
: the current hour%M
: the current minute%S
: the current second%p
: the process ID of the process starting the simulation%r
: a random characterThe default identifier format
%T-%y-%m-%d-%H-%M-%S_%p_%r%r%r%r%r%r%r%r-
would lead to an identifier likesimpact-cyan-2015-01-15-08-28-10_2425_q85z7m1G-
.
dataFiles
: if specified, this should be a list where each named entry contains a matrix. These matrices will be written to CSV files, which can be referred to in the configuration entries. For example, we could to something like this:myMatrix <- matrix(c(1,2,3,4),2,2) data <- list() data[["csvMatrix"]] <- myMatrixTo refer to this csv file in the configuration settings, we can use the
data:
prefix, e.g.cfg <- list() cfg["person.geo.dist2d.type"] <- "discrete" cfg["person.geo.dist2d.discrete.densfile"] <- "data:csvMatrix"The simulation would then be started as follows:
simpact.run(cfg, "/tmp/simpacttest", dataFiles=data)
The return value of the simpact.run
function contains the paths to generated files
and output files, or in case the dryrun
option was used, of files that will be written
to. The output files that are produced are described in the corresponding
section.
Other functions¶
Apart from simpact.showconfig
and simpact.run
, some other functions exist in the
RSimpactCyan
library as well:
simpact.available
This function returns a boolean value, that indicates if theRSimpactCyan
library is able to find and use the Simpact Cyan simulation program.
simpact.getconfig
This takes a list with config values as input, similar tosimpact.showconfig
, merges it with default settings, and returns the extended configuration list. If the second parameter (show
) is set toTRUE
, then the full configuration file will be shown on-screen as well.
simpact.run.direct
This function allows you to start a simulation based on a previously created (e.g. using thedryrun
setting ofsimpact.run
) configuration file. This config file must be set as the first argument, and is always required. Other arguments are optional:
outputFile
: If set toNULL
, the output of the Simpact Cyan simulation (containing information about the version of the program and the random number generator seed) will just appear on screen. If a filename is specified here, the output will be written to that file as well.release
,slowalg
,parallel
,seed
: Same meaning as in thesimpact.run
functiondestDir
: By default, the simulation will be run in the directory that contains the config file. This is important if the config file itself specifies files without an absolute path name since the directory of the config file will then be used as a starting point. If you don’t want this behaviour and need to select another directory, this parameter can be used to set it.
simpact.set.datadir
TheRSimpactCyan
library will try to figure out where the Simpact Cyan data files are located. If you want to specify another location, this function can be used to do so.
simpact.set.simulation
The Simpact Cyan package is actually meant to support alternative simulations as well. To use such an alternative simulation, this function can be used. For example, ifmaxart
is specified here, instead of running e.g.simpact-cyan-release
as underlying program,maxart-release
would be executed instead.
Running from within Python¶
Getting started¶
The pysimpactcyan
module to control Simpact Cyan from within Python, is
automatically available once you’ve installed the program. Note that if
you’re using MS-Windows, you’ll also need to install the
Visual Studio 2015 redistributable package
(use the x86 version).
To load the Simpact Cyan module in a Python script or interactive session, just execute
import pysimpactcyan
This allows you to create an instance of the PySimpactCyan
class that’s defined in this
module, let’s call it simpact
:
simpact = pysimpactcyan.PySimpactCyan()
Running a simulation¶
To configure a simulation, you need to specify the options for which you want to use a value other than the default. This is done using a dictionary, for example
cfg = { }
cfg["population.nummen"] = 200
cfg["population.numwomen"] = 200
cfg["population.simtime"] = 40
All values that are entered this way are converted to character strings when creating a configuration file for the simulation. This means that instead of a numeric value, you could also use a string that corresponds to the same number, for example
cfg["population.nummen"] = "200"
Together with the defaults for other options, these settings will be combined into a configuration file that the real Simpact Cyan program can understand. Taking a look at the full configuration file will show you what other values are in use; to see this configuration file, run
simpact.showConfiguration(cfg)
Lines that start with a #
sign are ignored when the configuration file is read.
They may contain comments about certain options, or show which options are not
in use currently. In case you’d want to use a simulation using all defaults, you can either use an
empty dictionary, or specify None
.
If you’ve got the configuration you’d like to use,
you can start the simulation from within Python using the run
method of the
Simpact Cyan object you’re using. Two
parameters must be specified: the first is the settings to use (the cfg
dictionary
in our example) and the second is a directory where generated files and results
can be stored. The Python module will attempt to create this directory if it does not
exist yet. To use the directory /tmp/simpacttest
, the command would become
res = simpact.run(cfg, "/tmp/simpacttest")
The other parameters are:
agedist
: With this parameter, you can specify the age distribution that should be used when generating an initial population. The default is the age distribution of South Africa from 2003. In Python, you can specify an alternative age distribution in two ways.The first way to do this, is to specify the age distribution as a dictionary which contains lists of numbers named
Age
,Percent.Male
andPercent.Female
. TheAge
list should be increasing, and the other lists specify the probability of selecting each gender between the corresponding age and the next. Before the first specified age, this probability is zero, and the last mentioned age should have zeroes as the corresponding probabilities. The term probability here is not strictly correct: it can be any positive number since the resulting distribution will be normed. As an examplead = { "Age": [0, 50, 100], "Percent.Male": [1, 2, 0], "Percent.Female": [2, 1, 0] }will correspond to an age distribution which limits the age to 100 for everyone. Furthermore, there will be twice as many men over 50 than under 50, while for the women it’s the other way around.
The other way an age distribution can be specified, is as a CSV file with (at least) three columns. The header of this CSV file will not be taken into account, instead the first column is assumed to hold the
Age
column, the second is interpreted as thePercent.Male
column and the third asPercent.Female
.
parallel
,opt
,release
: These flags specify which precise version of the simulation program will be used, and whether the single-core or multi-core version is used. Therelease
parameter isTrue
by default, yielding the fastest version of the selected algorithm. If set toFalse
, many extra checks are performed, all of which should pass if the algorithm works as expected.By default,
opt
isTrue
which selects the population-based procedure described above. In case this is set toFalse
, the very basic mNRM algorithm is used, where all event fire times are recalculated after each event is executed. If all works as expected, the two algorithms should produce the same results for the same seed (although very small differences are possible due to limited numeric precision). The basic algorithm is very slow, keep this in mind if you use it.The
parallel
parameter isFalse
by default, selecting the version of the algorithm that only uses a single processor core. To use the parallel version, i.e. to use several processor cores at the same time, this can be set toTrue
. The parallel version currently only offers a modest speedup, and only for very large population sizes. Especially if you need to do several runs of a simulation, starting several single-core versions at once will use your computer’s power more efficiently than starting several parallel versions in a sequential way.
seed
: By default, a more or less random seed value will be used to initialize the random number generator that’s being using in the simulation. In case you’d like to use a specific value for the seed, for example to reproduce results found earlier, you can set it here.
interventionConfig
: With this simulation intervention setting it is possible to change configuration options that are being used at specific times during the simulation. More information about how this can be used can be found in the explanation of the simulation intervention event.
dryRun
: If this is set toTrue
, the necessary configuration files will be generated, but the actual simulation is _not_ performed. This can come in handy to prepare a simulation’s settings on your local machine and run one or more actual simulations on another machine, e.g. on a computer cluster. In case you’d like to perform several runs with the same configuration file, overriding the output prefix can be very helpful, as is described in the section on the configuration file. If you’d like to perform a run that has been prepared this way from within Python, you can use therunDirect
method of thePySimpactCyan
class.
identifierFormat
: Files that are created by the simulation will all start with the same identifier. The identifierFormat parameter specifies what this identifier should be. Special properties start with a percent (%
) sign, other things are just copied. An overview of these special properties:
%T
: will expand to the simulation type, e.g.simpact-cyan
%y
: the current year%m
: the current month (number)%d
: the current day of the month%H
: the current hour%M
: the current minute%S
: the current second%p
: the process ID of the process starting the simulation%r
: a random characterThe default identifier format
%T-%y-%m-%d-%H-%M-%S_%p_%r%r%r%r%r%r%r%r-
would lead to an identifier likesimpact-cyan-2015-01-15-08-28-10_2425_q85z7m1G-
.
dataFiles
: if specified, this should be a dictionary where each named entry contains a an array of arrays or a Pandas DataFrame. These matrices will be written to CSV files, which can be referred to in the configuration entries. For example, we could to something like this:myMatrix = [ [1, 3], [2, 4] ] data = { } data["csvMatrix"] = myMatrixTo refer to this csv file in the configuration settings, we can use the
data:
prefix, e.g.cfg = { } cfg["person.geo.dist2d.type"] = "discrete" cfg["person.geo.dist2d.discrete.densfile"] = "data:csvMatrix"The simulation would then be started as follows:
simpact.run(cfg, "/tmp/simpacttest", dataFiles=data)
The return value of the run
method contains the paths to generated files
and output files, or in case the dryRun
option was used, of files that will be written
to. The output files that are produced are described in the corresponding
section.
Other functions¶
Apart from the PySimpactCyan
methods showConfiguration
and run
, some other methods
exist in this Python class as well:
getConfiguration
This takes a dictionary with config values as input, similar toshowConfiguration
, merges it with default settings, and returns the extended configuration dictionary. If the second parameter (show
) is set toTrue
, then the full configuration file will be shown on-screen as well.
runDirect
This function allows you to start a simulation based on a previously created (e.g. using thedryRun
setting ofrun
) configuration file. This config file must be set as the first argument, and is always required. Other arguments are optional:
outputFile
: If set toNone
, the output of the Simpact Cyan simulation (containing information about the version of the program and the random number generator seed) will just appear on screen. If a filename is specified here, the output will be written to that file as well.release
,opt
,parallel
,seed
: Same meaning as in therun
method.destDir
: By default, the simulation will be run in the directory that contains the config file. This is important if the config file itself specifies files without an absolute path name since the directory of the config file will then be used as a starting point. If you don’t want this behaviour and need to select another directory, this parameter can be used to set it.
setSimpactDataDirectory
Thepysimpactcyan
module will try to figure out where the Simpact Cyan data files are located. If you want to specify another location, this function can be used to do so.
setSimpactDirectory
In case you want to specify that the Simpact Cyan executables are located in a specific directory, you can use this function.
setSimulationPrefix
The Simpact Cyan package is actually meant to support alternative simulations as well. To use such an alternative simulation, this function can be used. For example, ifmaxart
is specified here, instead of running e.g.simpact-cyan-release
as underlying program,maxart-release
would be executed instead.
Configuration file and variables¶
The actual program that executes the simulation reads its settings from a certain configuration file. This is also the case when running from R or Python, where the R or Python interface prepares the configuration file and executes the simulation program. While this approach makes it much easier to configure and run simulations, some knowledge of the way the configuration file works can be helpful.
The basics¶
In essence, the configuration file is just a text file containing key/value pairs. For example, the line
population.simtime = 100
assigns the value 100 to the simulation setting population.simtime
, indicating
that the simulation should run for 100 years. Lines that start with a hash sign (#
)
are not processed, they can be used for comments. In the config file itself, mathematical
operations are not possible, but if you’re using R or Python, you can perform the
operation there, and only let the result appear in the config file. For example, if
you’d do
library("RSimpactCyan")
cfg <- list()
cfg["population.simtime"] = 100/4
simpact.showconfig(cfg)
in an R session, you’d find
population.simtime = 25
in the configuration file. We could force ‘100/4’ to appear in the configuration file by changing the line to
cfg["population.simtime"] = "100/4"
(so we added quotes), but when trying to run the simulation this would lead to the following error:
FATAL ERROR:
Can't interpret value for key 'population.simtime' as a floating point number
Config file variables and environment variables¶
Keys that start with a dollar sign ($
) are treated in a special way: they define
a variable that can be used further on in the configuration file. To use a variable’s
contents in the value part of a config line, the variable’s name
should be placed between ${
and }
. For example, we could first have set
$SIMTIME = 100
thereby assigning 100
to the variable with name SIMTIME
. This could then later be
used as follows:
population.simtime = ${SIMTIME}
You don’t even need to define the variable inside the configuration file: if you
define an environment variable, you can use its contents in the same way as before.
For example, if the HOME
environment variable has been set to /Users/JohnDoe/
,
then the lines
periodiclogging.interval = 2
periodiclogging.outfile.logperiodic = ${HOME}periodiclog.csv
would enable the periodic logging event and write its output
every other year to /Users/JohnDoe/periodiclog.csv
.
One very important thing to remember is that if an environment variable with the same name as a config file variable exists, the environment variable will always take precedence over config file variables. While this might seem a bit odd, it actually allows you to more easily use config files prepared on one system, on another system. Furthermore, it allows you to use a single config file multiple times, which can be very handy if you need to perform many runs using the same settings (but different output files).
Using environment variables¶
When you let the R or Python interface prepare a configuration file, this file will start by defining two config file variables, for example:
$SIMPACT_OUTPUT_PREFIX = simpact-cyan-2015-05-27-08-28-13_27090_8Al7O6mD-
$SIMPACT_DATA_DIR = /usr/local/share/simpact-cyan/
The first variable is used in the config file when specifying which files to write output to. As an example, you’d also find the line
logsystem.outfile.logevents = ${SIMPACT_OUTPUT_PREFIX}eventlog.csv
in that file, so the full output file name would be
simpact-cyan-2015-05-27-08-28-13_27090_8Al7O6mD-eventlog.csv
The second variable specifies the location that R or Python thinks the Simpact Cyan data files are stored in, and is used in the line that specifies which age distribution to use when initializing the population:
population.agedistfile = ${SIMPACT_DATA_DIR}sa_2003.csv
In this case, the file /usr/local/share/simpact-cyan/sa_2003.csv
would be
used to read the initial age distribution from.
Because those config variables are defined inside the configuration file, such
a file can be used on its own. If you’d first prepared the config file using the
‘dryrun’ setting, you could still use the created config file to start the
simulation, either directly on the command line, using simpact.run.direct
from R,
or using the PySimpactCyan
method runDirect
in Python.
If you’re running from the command line, it’s very easy to reuse the same configuration file for multiple runs. Normally if you’d try this, you’d see an error message like
FATAL ERROR:
Unable to open event log file:
Specified log file simpact-cyan-2015-05-27-08-28-13_27090_8Al7O6mD-eventlog.csv already exists
To make sure that you don’t lose data from simulations you’ve already performed, the simulation will not start if it needs to overwrite an output file, which is what causes this message.
However, because we can easily override the value of
SIMPACT_OUTPUT_PREFIX
from the command line by using an environment variable
with the same name, it becomes possible to reuse the configuration file multiple
times. For example, assuming that our config file is called myconfig.txt
,
the simple bash script
for i in 1 2 3 4 5 ; do
SIMPACT_OUTPUT_PREFIX=newprefix_${i}- simpact-cyan-release myconfig.txt 0 opt
done
would produce output files like newprefix_1-eventlog.csv
and newprefix_5-eventlog.csv
.
In a similar way, setting an environment variable called SIMPACT_DATA_DIR
can be helpful when preparing simulations on one system and running them on
another. For example, you could prepare the simulations on your laptop, using the
‘dryrun’ option to prevent the simulation from actually running, and execute
them on e.g. a computer cluster where you set the SIMPACT_DATA_DIR
environment
variable to make sure that the necessary data files can still be found.
Supported probability distributions and their parameters¶
If a configuration option ends with .dist.type
or .dist2d.type
, for example
option birth.pregnancyduration.dist.type
of the birth event, you can
specify a number of probability distributions there. By choosing a specific
type of probability distribution, you also activate a number of other options
to configure the parameters of this probability distribution.
For example, if birth.pregnancyduration.dist.type
is set to normal
, then
parameters of the one dimensional normal distribution need to be
configured. For example, we could set birth.pregnancyduration.dist.normal.mu
to 0.7342 and birth.pregnancyduration.normal.sigma
to 0.0191, and we’d get
a birth event that on average takes place after 0.7342 years (is 268 days),
with a standard deviation of roughly one week (0.0191 years).
Below you can find an overview of the currently supported one and two dimensional distributions and their parameters.
One dimensional distributions¶
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist.type
: (‘fixed’):
With such an option, you specify which specific distribution to choose. Allowed values arebeta
,discrete.csv.onecol
,discrete.csv.twocol
,discrete.inline
,exponential
,fixed
,gamma
,lognormal
,normal
,uniform
, and the corresponding parameters are given in the subsections below. Unless otherwise specified, the default here is afixed
distribution, which is not really a distribution but just causes a fixed value to be used.
beta
¶
If this distribution is chosen, the (scaled) beta distribution with the following probability density is used:
This corresponds to a beta distribution that, instead of being non-zero between 0 and 1, is now scaled and translated to be defined between \(x_{\rm min}\) and \(x_{\rm max}\).
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist.beta.a
(no default):
Corresponds to the value of \(a\) in the formula for the probability density above.some.option.dist.beta.b
(no default):
Corresponds to the value of \(b\) in the formula for the probability density above.some.option.dist.beta.max
(no default):
Corresponds to the value of \(x_{\rm min}\) in the formula for the probability density above.some.option.dist.beta.min
(no default):
Corresponds to the value of \(x_{\rm max}\) in the formula for the probability density above.
discrete.csv.onecol¶
This distribution allows you to draw random numbers based on the values in a
single column (ycolumn
) of a CSV file (file
). The distribution will return
values between xmin
and xmax
, with probabilities according to the entries
in the column of the CSV file. If floor
is set to no
, then any value that
lies within the bin is possible; if yes
, then only start values of each bin
can be returned.
For example, suppose we have a CSV file that corresponds to this table:
Prob |
---|
10 |
30 |
If we specify xmin
to be 0 and xmax
to be 2, then there’s 25% chance that
the generated random number will lie between 0 and 1, and 75% chance that
it will lie between 1 and 2. If we set floor
to yes
, then 25% of the generated
random numbers will be 0, while 75% will be 1.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist.discrete.csv.onecol.file
(no default):
This is the filename of the CSV file containing the column to use for the probability density.some.option.dist.discrete.csv.onecol.floor
(‘no’):
By default, any value within a bin is allowed. If set toyes
, then only the bin start values can be generated.some.option.dist.discrete.csv.onecol.xmin
(0):
The minimum value that’s possibly generated by the distribution. Maps to the start of the CSV column.some.option.dist.discrete.csv.onecol.xmax
(1):
The maximum value that’s possibly generated by the distribution. Maps to the end of the CSV column.some.option.dist.discrete.csv.onecol.ycolumn
(1):
The number of the column to use from the CSV file.
discrete.csv.twocol¶
This distribution allows you to generate random numbers based on the values
in two columns (xcolumn
and ycolumn
) of a CSV file (file
). The xcolumn
of the CSV file specifies the start of each bin, the ycolumn
corresponds to
the probability of generating a random number in that bin. To be able to
determine when the last bin ends, the last entry in the ycolumn
must be
zero. If floor
is set to no
, then any value that
lies within the bin is possible; if yes
, then only start values of each bin
can be returned.
For example, support we have a CSV file that corresponds to the table below,
and we’ve set xcolumn
to 1 and ycolumn
to 2. In this case, 25% of the
generated random numbers will have a value between 0 and 3, while 75% of the
random numbers will lie between 3 and 4. If floor
is set to yes
, then 25%
of the random numbers will be 0, and 75% will be 3.
X | Prob |
---|---|
0 | 10 |
3 | 30 |
4 | 0 |
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist.discrete.csv.twocol.file
(no default):
This is the filename of the CSV file containing the columns to use for the probability density.some.option.dist.discrete.csv.twocol.floor
(‘no’):
By default, any value within a bin is allowed. If set toyes
, then only the bin start values can be generated.some.option.dist.discrete.csv.twocol.xcolumn
(1):
The number of the column to use from the CSV file that contains the bin start values.some.option.dist.discrete.csv.twocol.ycolumn
(2):
The number of the column to use from the CSV file that contains the probabilities.
discrete.inline
¶
This distribution allows you to generate random numbers based on the values
in two lists (xvalues
and yvalues
). The xvalues
list specifies the start
of each bin, the yvalues
entries correspond to the probability of generating
a random number in that bin. To be able to
determine when the last bin ends, the last entry in yvalues
must be
zero. If floor
is set to no
, then any value that
lies within the bin is possible; if yes
, then only start values of each bin
can be returned.
For example, support we have a the following settings:
xvalues = 0,3,4
yvalues = 10,30,0
In this case, 25% of the generated random numbers will have a value between 0 and 3,
while 75% of the random numbers will lie between 3 and 4. If floor
is set to yes
,
then 25% of the random numbers will be 0, and 75% will be 3.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist.discrete.inline.floor
(‘no’):
By default, any value within a bin is allowed. If set toyes
, then only the bin start values can be generated.some.option.dist.discrete.inline.xvalues
(no default):
A list of increasing values corresponding to the bin start values.some.option.dist.discrete.inline.yvalues
(no default):
A list of equal length asxvalues
, specifying the probability of generating a random number in the corresponding bin. The last value must be zero.
exponential
¶
If the exponential distribution is selected, the probability density for a negative value is zero, while the probability density for positive values is given by:
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist.exponential.lambda
(no default):
This specifies the value of \(\lambda\) in the expression of the probability density above.
fixed
¶
This does not really correspond to a distribution, instead a predefined value is always used.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist.fixed.value
(0):
When a number is chosen from this ‘distribution’, this value is always returned.
gamma
¶
In this case, the gamma distribution will be used to choose random numbers. The probability density is
for positive numbers, and zero for negative ones.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist.gamma.a
(no default):
This corresponds to the value of \(a\) in the expression of the probability distribution.some.option.dist.gamma.b
(no default):
This corresponds to the value of \(b\) in the expression of the probability distribution.
lognormal
¶
If this log-normal distribution is chosen, the probability density for negative numbers is zero, while for positive numbers it is:
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist.lognormal.sigma
(no default):
This corresponds to the value of \(\sigma\) in the formula for the probability distribution.some.option.dist.lognormal.zeta
(no default):
This corresponds to the value of \(\zeta\) in the formula for the probability distribution.
normal
¶
The base probability distribution used when the normal distribution is selected is the following:
It is possible to specify a minimum and maximum value as well, which causes the probability density to be zero outside of these bounds, and somewhat higher in between. A very straightforward rejection sampling method is used for this at the moment, so it is best not to use this to sample from a narrow interval (or in general an interval with a low probability) since this can require many iterations.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist.normal.max
(+infinity):
This can be used to specify the maximum value beyond which the probability density is zero. By default no truncation is used, and this maximum is set to positive infinity.some.option.dist.normal.min
(-infinity):
This can be used to specify the minimum value below which the probability density is zero. By default no trunctation is used, and this minimum is set to negative infinity.some.option.dist.normal.mu
(no default):
This corresponds to the value of \(\mu\) in the expression of the probability density above.some.option.dist.normal.sigma
(no default):
This corresponds to the value of \(\sigma\) in the expression of the probability density above.
uniform
¶
When this probability density is selected, each number has an equal probability density between a certain minimum and maximum value. Outside of these bounds, the probability density is zero.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist.uniform.min
(0):
This specifies the start of the interval with the same, constant probability density.some.option.dist.uniform.max
(1):
This specifies the end of the interval with the same, constant probability density.
Two dimensional distributions¶
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist2d.type
(‘fixed’):
binormal
,binormalsymm
,discrete
,fixed
,uniform
binormal
¶
This corresponds to the two dimensional multivariate normal distribution, which has the following probability density:
If desired, this probability density can be truncated to specific bounds, by
setting the minx
, maxx
, miny
and maxy
parameters. These default to
negative and positive infinity causing truncation to be disabled. To enforce
these bounds a straightforward rejection sampling
method is used, so to avoid a large number of iterations to find a valid
random number, it is best not to restrict the acceptable region to one with
a low probability.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist2d.binormal.meanx
(0):
Corresponds to \(\mu_x\) in the expression for the probability density above.some.option.dist2d.binormal.meany
(0):
Corresponds to \(\mu_y\) in the expression for the probability density above.some.option.dist2d.binormal.rho
(0):
Corresponds to \(\rho\) in the expression for the probability density above.some.option.dist2d.binormal.sigmax
(1):
Corresponds to \(\sigma_x\) in the expression for the probability density above.some.option.dist2d.binormal.sigmay
(1):
Corresponds to \(\sigma_y\) in the expression for the probability density above.some.option.dist2d.binormal.minx
(-infinity):
This can be used to truncate the probability distribution in the x-direction.some.option.dist2d.binormal.maxx
(+infinity):
This can be used to truncate the probability distribution in the x-direction.some.option.dist2d.binormal.miny
(-infinity):
This can be used to truncate the probability distribution in the y-direction.some.option.dist2d.binormal.maxy
(+infinity):
This can be used to truncate the probability distribution in the y-direction.
binormalsymm
¶
This is similar to the binormal
distribution above, but using the same parameters
for the x-direction as for the y-direction. This means it is also a two dimensional
multivariate normal distribution,
but with a less general probability distribution:
If desired, this probability density can be truncated to specific bounds, by
setting the min
and max
parameters. These default to
negative and positive infinity causing truncation to be disabled. To enforce
these bounds a straightforward rejection sampling
method is used, so to avoid a large number of iterations to find a valid
random number, it is best not to restrict the acceptable region to one with
a low probability.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist2d.binormalsymm.mean
(0):
This corresponds to the value of \(\mu\) in the formula for the probability density.some.option.dist2d.binormalsymm.rho
(0:
This corresponds to the value of \(\rho\) in the formula for the probability density.some.option.dist2d.binormalsymm.sigma
(1):
This corresponds to the value of \(\sigma\) in the formula for the probability density.some.option.dist2d.binormalsymm.max
(+infinity):
This can be used to truncate the probability density, to set a maximum in both the x- and y-directions.some.option.dist2d.binormalsymm.min
(-infinity):
This can be used to truncate the probability density, to set a minimum in both the x- and y-directions.
discrete
¶
With the discrete
distribution, you can use a TIFF image file
or a CSV file to specify a probability distribution. This can come in handy if you’d like to
use population density data for example. The TIFF file format is very general,
and can support several sample representations and color channels. Since the
specified file will be used for a probability distribution, only one value per
pixel is allowed (as opposed to separate values for red, green and blue for
example), and a 32-bit or 64-bit floating point representation should be used.
Negative values are set to zero, while positive values will be normalized and
used for the probability distribution.
Instead of reading the probabilities for each pixel/bin from a TIFF file,
they can also be read from a CSV file. Again, negative values will be clipped
to zero, while positive values will be used for the probability distribution.
If desired, a ‘mask file’ can be specified as well (using maskfile
). Such a file should also be
a TIFF or CSV file, with the same number of entries in each dimension. If the value of
a certain pixel in the mask file is zero or negative, the value read from the
real input file (densfile
) will be set to zero, otherwise the value from the
real input file is left unmodified. This can be used to easily restrict the
original file to a certain region.
Suppose we have a 320x240 image that we’d like to use to base a probability
density on. In the TIFF file format, as with many other image formats, the
upper-left pixel is the (0, 0) pixel, while the bottom-right pixel will be
(319, 239). Usually, we’d like the y-axis to point up however, which is why the default
value of the flipy
parameter is set to yes
. To illustrate, suppose we use
the up32.tiff file, which corresponds to the following image.
If we use this as a discrete probability density, a histogram of the samples
should show the text ‘UP’.

In the discretedistribution.ipynb example, we
abuse the setting of the population density to sample from this distribution:
each person will have a location that is sampled from this discrete distribution.
In case the flipy
parameter is yes
(the default), we obtain the following
histogram, which is probably what we’d expect.

On the other hand, if we explicitly set the flipy
parameter to no
, the
mismatch between the y-axes becomes apparent:

The same reasoning applies for the CSV file: because the row number increases
each time a line is read from the CSV file, the y-axis of the file points
down. If the flipy
parameter is set to yes
(the default), this axis is
inverted. So for example, suppose you have the following CSV file
"C1","C2","C3"
1,2,3
4,5,6
which corresponds to the following table:
C1 | C2 | C3 |
---|---|---|
1 | 2 | 3 |
4 | 5 | 6 |
In this case, with flipy
set to yes
, this would correspond to the following
2D distribution:

The image file itself just specifies the shape of the probability distribution.
The actual size and position in the x-y plane can be set using the width
, height
,
xoffset
and yoffset
parameters.
There’s also a floor
parameter, which defaults to no
. If this is the case, then
in each pixel/bin a constant probability is assumed, making it possible for the
2D probability distribution to generate any value pair within the bin. If however, the
value of the parameter is set to yes
, then only the bin corner values can be
returned, nothing in between. This is useful if you’d like a distribution to
generate only a fixed set of values. The following notebook illustrates this:
discretedistfloor.ipynb
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist2d.discrete.densfile
(no default):
This should be used to specify the TIFF or CSV file that contains the discrete probability density to be used.some.option.dist2d.discrete.maskfile
(no default):
As explained above, a TIFF or CSV mask file can be used to restrict the previous file to a certain region. Set to an empty string in case you don’t need this.some.option.dist2d.discrete.flipy
(‘yes’):
By default, the image will be flipped in the y-direction. This has to do with the y-axis in images being different from what we’d expect (see explanation above).some.option.dist2d.discrete.floor
(‘no’):
By default, any value inside a bin/pixel is possible. If set to ‘yes’, the 2D distribution will only generate values that correspond to the corner of a bin.some.option.dist2d.discrete.width
(1):
The TIFF or CSV file itself just specifies the shape of the distribution. With this parameter you can set the actual width (scale in x-direction) in the x-y plane.some.option.dist2d.discrete.height
(1):
The TIFF or CSV file itself just specifies the shape of the distribution. With this parameter you can set the actual height (scale in y-direction) in the x-y plane.some.option.dist2d.discrete.xoffset
(0):
The TIFF or CSV file itself just specifies the shape of the distribution. With this parameter you can set the x-offset in the x-y plane.some.option.dist2d.discrete.yoffset
(0):
The TIFF or CSV file itself just specifies the shape of the distribution. With this parameter you can set the y-offset in the x-y plane.
fixed
¶
This does not really correspond to a distribution, instead a predefined (x, y) value is always used.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist2d.fixed.xvalue
(0):
The x-value of the (x, y) coordinate that is always returned.some.option.dist2d.fixed.yvalue
(0):
The y-value of the (x, y) coordinate that is always returned.
uniform
¶
By specifying this probability density, a point shall be chosen from a rectangular region in the x-y plane. All points within this region have an equal probability density, while points outside the region have a probably density of zero.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
some.option.dist2d.uniform.xmin
(0):
This specifies the start of the region along the x-axis.some.option.dist2d.uniform.xmax
(1):
This specifies the end of the region along the x-axis.some.option.dist2d.uniform.ymin
(0):
This specifies the start of the region along the y-axis.some.option.dist2d.uniform.ymax
(1):
This specifies the end of the region along the y-axis.
Output¶
By default, several log files are used, but they can be disabled by assigning an
empty string to the configuration property. If you’re using the R or Python
interface, the full paths of these log files will be stored in the object returned
by simpact.run
or the PySimpactCyan
method run
.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
logsystem.outfile.logevents
(‘${SIMPACT_OUTPUT_PREFIX}eventlog.csv’):
Here, all events that take place are logged. See the section about the configuration file for additional information regarding theSIMPACT_OUTPUT_PREFIX
variable.logsystem.outfile.logpersons
(‘${SIMPACT_OUTPUT_PREFIX}personlog.csv’):
In this file, information about every person in the simulation is stored. See the section about the configuration file for additional information regarding theSIMPACT_OUTPUT_PREFIX
variable.logsystem.outfile.logrelations
(‘${SIMPACT_OUTPUT_PREFIX}relationlog.csv’):
Here, all relationships are logged. See the section about the configuration file for additional information regarding theSIMPACT_OUTPUT_PREFIX
variable.logsystem.outfile.logtreatments
(‘${SIMPACT_OUTPUT_PREFIX}treatmentlog.csv’):
This file records information regarding treatments. See the section about the configuration file for additional information regarding theSIMPACT_OUTPUT_PREFIX
variable.logsystem.outfile.logviralloadhiv
(‘${SIMPACT_OUTPUT_PREFIX}hivviralloadlog.csv’):
In this file the HIV viral load changes can be seen for different individuals. See the section about the configuration file for additional information regarding theSIMPACT_OUTPUT_PREFIX
variable.logsystem.outfile.logsettings
(‘${SIMPACT_OUTPUT_PREFIX}settingslog.csv’):
This file records the settings that were used at the start of the program and after each simulation intervention.logsystem.outfile.loglocation
(‘${SIMPACT_OUTPUT_PREFIX}locationlog.csv’):
This file records the geographical locations that were assigned to a person. In case a non-trivial geographical distribution is used and relocations are enabled, this allows persons to be tracked throughout the simulation.
Event log¶
The event log is a CSV-like file, in which each line contains at least ten entries:
- The simulation time at which the event took place
- A short description of the event
- The name of the first person involved in the event, or
(none)
- The person ID of the first person involved, or -1
- The gender (0 for a man, 1 for a woman) of the first person involved in the event, or -1
- The age of the first person involved in the event, or -1
- The name of the second person involved in the event, or
(none)
- The person ID of the second person involved, or -1
- The gender (0 for a man, 1 for a woman) of the second person involved in the event, or -1
- The age of the second person involved in the event, or -1
On a specific line, more entries may be present. In that case, the number of extra entries will be a multiple of two, with the first entry of a pair being a short description and the second the actual value.
Some event descriptions are within
parentheses, like (childborn)
or (relationshipended)
. These aren’t actual
events themselves, but a kind of pseudo-event: they are log entries for certain
actions that are triggered by a real mNRM-event. For example, a birth event
will trigger the (childborn)
pseudo-event, to be able to have a log entry for
the new person that is introduced into the population. The (relationshipended)
pseudo-event is triggered both by the dissolution of a relationship
and the death of a person, either by AIDS related causes or
due to a ‘normal’ mortality event.
Person log¶
The person log file is a CSV file with entries for each person in the simulation, both for persons who are deceased and who are still alive when the simulation finished. At the moment, the following columns are defined:
ID
: The ID of the person that this line is about.Gender
: The gender (0 for a man, 1 for a woman) of the person that this line is about.TOB
: The time of birth of this person.TOD
: The time of death of this person, orinf
(infinity) if the person is still alive when the simulation ends.IDF
: The ID of the father of this person, or -1 if the person is part of the initial population created at the start of the simulation.IDM
: The ID of the mother of this person, or -1 if the person is part of the initial population created at the start of the simulation.TODebut
: The simulation time at which the person became sexually active. If this is zero, it means that the person was already old enough at the start of the simulation, otherwise it’s the time at which the debut event for this person took place (orinf
if debut did not take place yet).FormEag
: The value of the formation eagerness parameter for this person for forming heterosexual relationships, which can be used in the formation event.FormEagMSM
: The value of the formation eagerness parameter for this person for MSM relationships, which can be used in the MSM formation event.InfectTime
: The time at which this person became HIV-infected, orinf
if the person was not infected. Will be the time at which either an HIV seeding event took place, or at which a transmission event took place.InfectOrigID
: The ID of the person that infected the current person, or -1 if the current person was not infected or infected by a seeding event.InfectType
: This will be -1 if the person was not infected, 0 if the person got infected due to a seeding event and 1 if due to a transmission event.log10SPVL
: If infected, this contains the logarithm (base 10) of the set-point viral load of this person that was first chosen (so _not_ affected by treatment). If not infected, this will be-inf
.TreatTime
: The time at which this person last received treatment, orinf
if no treatment was received.XCoord
: Each person is assigned a geographic location, of which this is the x-coordinate. In case relocations are possible, the value in this log file will be the last one in the simulation. For more detailed information the location log file can be used.YCoord
: Each person is assigned a geographic location, of which this is the y-coordinate. In case relocations are possible, the value in this log file will be the last one in the simulation. For more detailed information the location log file can be used.AIDSDeath
: Indicates what the cause of death for this person was. Is -1 in case the person is still alive at the end of the simulation, 0 if the person died from non-AIDS related causes, and +1 in case the person’s death was caused by AIDS.HSV2InfectTime
: This is the time at which this person became HSV2 infected, orinf
in case the person is not infected.HSV2InfectOriginID
: The ID of the person that’s the origin of the HSV2 infection, or -1 if there is none (no infection or seeded).CD4atInfection
: As explained in the CD4 count related documentation, the CD4 values at start of infection and at time of death are currently chosen from a distribution. This column will contain the first of these values when the person is infected, or -1 otherwise.CD4atDeath
: Similar to the previous column, but will contain the CD4 value at the time of (AIDS related) death.
Relationship log¶
In the relationship log, information about all dissolved relationships is logged, as well as information about relationships that still exist when the simulation ends. The file is a CSV file, currently with five columns:
ID1
: The ID of the first person in the relationship.ID2
: The ID of the second person in the relationship.FormTime
: The time the relationship between these two people got formed.DisTime
: The time at which the relationship between these two people dissolved, orinf
(infinity) if the relationship still existed when the simulation ended.AgeGap
: the age difference between the man and the woman in the relationship. A positive value means that the man is older than the woman.MSM
: If 1, then the relationship is an MSM relation, if 0 it’s a heterosexual relationship.
Treatment log¶
This CSV file contains information about all antiretroviral treatments that took place during the simulation, both for treatments that are ongoing when the simulation ended and for treatments that were ended during the simulation (due to the person dropping out or dying). The file currently has five columns:
ID
: the ID of the person that received treatmentGender
: The gender (0 for a man, 1 for a woman) of the person that got treatedTStart
: The time at which the treatment startedTEnd
: The time at which the treatment ended (by dropping out or because the person died). In case treatment was still going on when the simulation ended, this isinf
(infinity).DiedNow
: If the treatment got stopped because the person died, this flag will be 1. Otherwise it will be 0.CD4atARTstart
: The value of the CD4 count of this person right before the treatment started.
HIV Viral load log¶
This CSV log file describes the changes in the HIV viral load of different individuals. As described in the section about HIV viral load related options, each person has a set-point viral load, which is the observed viral load in the chronic stage (see also the section about the general flow of the simulation). In the acute stage and AIDS stages, the observed viral load is derived from this set-point value.
The file currently has five columns:
Time
: the time at which the viral load for a person was changed
ID
: the ID of the person for whom the viral load was changed
Desc
: a description of the cause of the change, which can be one of the following:
Infection by seeding
: a person became infected due to the HIV seeding event.Infection by transmission
: a person became infected by transmission of the virus.Chronic stage
: a person entered the chronic stage.AIDS stage
: an AIDS stage event got triggered, advancing the person from the chronic stage to the AIDS stage.Final AIDS stage
: an AIDS stage event got triggered, advancing the person from the AIDS stage to the final AIDS stage.Started ART
: the viral load was lowered thanks to starting ART during a monitoring event.Dropped out of ART
: the viral load was increased because the person dropped out of treatment.
Log10SPVL
: this is the set-point viral load of the person, on a logarithmic scale. This value is the base value that’s used to calculate the actual, observed viral load from.
Log10VL
: the observed viral load of the person, on a logarithmic scale.
Settings log¶
The settings log file contains the settings used throughout the simulation. The names of the columns describe the configuration options being logged, and there will be a row with values of these configuration options each time the settings got changed. The time at which they were applied is also recorded in the log file in the first column. The first row in the log file will describe the names of the columns and at least one other row will be present, describing the settings used when the simulation was initialized. In case simulation intervention events are used, additional rows will be present.
This means that the structure of the settings log will look like the one below in case a simulation intervention event was used to change a parameter of the ‘agegap’ formation hazard after ten years in the simulation:
"t","aidsstage.final", ..., "formation.hazard.agegap.baseline", ...
0 , 0.5 , ..., 0.1 , ...
10 , 0.5 , ..., 0.2 , ...
Location log¶
The person log file records the geographical location of a person, but this is only the last known location. By default, the location of a person is set to (0, 0), but if a non-trivial geographical distribution is used instead, relocation events may be of interest. In this case however, a person can have multiple locations throughout the simulation, and a single entry in the person log file will no longer suffice.
For this reason, each time a person is assigned a 2D location, an entry is written to a location log file. The columns in this file are:
Time
: the time at which the person was assigned the specified location.ID
: the identifier of the person this location applies to.XCoord
: the x-coordinate of the location of the person.YCoord
: the y-coordinate of the location of the person.
Simulation details¶
General flow of a simulation¶
As one might expect, a population consists of persons which can either be male or female. Persons can be introduced into the simulation in two ways:
- During the initialization of the simulation, in which case persons with certain ages (drawn from a distribution) are added to the simulation.
- When the simulation is running, and the birth of a new person occurs.
Once born, a person will become sexually active when a debut event is triggered. If the person is introduced into the population at the start of the simulation, and the age exceeds the debut age, this event is no longer scheduled. Every person always has a ‘normal’ mortality event scheduled, which corresponds to a cause of death other than AIDS.
To get the HIV epidemic started, an HIV seeding event can be scheduled. When this event is triggered, a number of people in the existing population will be marked as being HIV-infected. An infected person will go through a number of infection stages. Until a chronic stage event is triggered the person is in the acute HIV infection stage; afterwards he will be in the chronic stage. A specific amount of time before dying of AIDS, an AIDS stage event is triggered, marking the transition of the chronic HIV stage to the actual AIDS stage. Even closer to the AIDS related death, another AIDS stage event is triggered, after which the person is in the ‘final AIDS stage’, and will be too ill to e.g. form sexual relationships. When the person dies of AIDS, the AIDS mortality event is fired. Note that it is always possible that the person dies from other causes; in that case the ‘normal’ mortality event will get triggered sooner.
If two persons of opposite gender are sexually active, a relationship can be formed. If this is the case, a formation event will be triggered. When a relationship between two people exists, it is possible that conception takes place, in which case a conception event will be triggered. If this happens, a while later a birth event will be fired, and a new person will be introduced into the population. In case one of the partners in the relationship is HIV infected, transmission of the virus may occur. If so, a transmission event will fire, and the newly infected person will go through the different infection stages as described earlier. Of course, it is also possible that the relationship will cease to exist, in which case a dissolution event will be fired. Note that in the version at the time of writing, there is no mother-to-child-transmission (MTCT).
Starting treatment and dropping out of treatment is managed by another sequence of events. When a person gets infected, either by HIV seeding or by transmission, first a diagnosis event is scheduled. If this is triggered, the person is considered to feel bad enough to go to a doctor and get diagnosed as being infected with HIV. If this happens, an HIV monitoring event is scheduled to monitor the progression of the HIV infection. If the person is both eligible and willing to receive antiretroviral therapy, treatment is started; if not, a new monitoring event will be scheduled. In case treatment is started, no more monitoring events will be scheduled, but the person will have a chance to drop out of treatment, in which case a dropout event is triggered. When a person drops out of treatment, a new diagnosis event will be scheduled. The rationale is that when a person drops out, he may do so because he’s feeling better thanks to the treatment. After dropping out, the condition will worsen again, causing him to go to a doctor, get re-diagnosed and start treatment again.
Initialization of the simulation¶
During the initialization of the simulated population, the following steps will take place:
Create the initial population:
- A number of men (
population.nummen
) and women (population.numwomen
) are added to the population, of which the age is drawn from an age distribution file (population.agedistfile
). Depending on the debut age, people may be marked as being ‘sexually active’.- The initial population size will be remembered for use in e.g. the formation hazard. During the simulation, this size can be synchronized using another event.
Schedule the initial events:
- For each person, a ‘normal’ mortality event will be scheduled, and if needed, a debut event will be scheduled.
- Get the HIV epidemic started at some point, by scheduling an HIV seeding event.
- If specified, schedule the next simulation intervention. This is a general way of changing simulation settings during the simulation.
- Schedule a periodic logging event if requested. This will log some statistics about the simulation at regular intervals.
- In case the population size is expected to vary much, one can request an event to synchronize the remembered population size for use in other events.
- For pairs of sexually active persons, depending on the ‘eyecap’ settings (
population.eyecap.fraction
), schedule formation events
Once the simulation is started, it will run either until the number of years specified in
population.simtime
have passed, or until the number of events specified in
population.maxevents
have been executed.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
population.nummen
(100):
The initial number of men when starting the simulation.
population.numwomen
(100):
The initial number of women when starting the simulation.
population.simtime
(15):
The maximum time that will be simulated, specified in years.
population.maxevents
(-1):
If greater than zero, the simulation will stop when this number of events has been executed. This is not used if negative.
population.agedistfile
( “sa_2003.csv” in the Simpact Cyan data directory):
This is a CSV file with three columns, named ‘Age’, ‘Percent Male’ and ‘Percent Female’. The values of the age are specified in years and should be increasing; the specified percentages are deemed valid until the next age. The default is the age distribution in South Africa from 2003.Note that when using the R or Python method to start simulations, you need to specify the age distribution as a parameter to the
run
function, if you want to use any other distribution than the default. See the R section or Python section for more information.
population.eyecap.fraction
(1):
This parameter allows you to specify with how many persons of the opposite sex (who are sexually active), specified as a fraction, someone can possibly have relationships. If set to the default value of one, every man can possibly engage in a relationship with every woman (and vice versa) causing \(O(N^2)\) formation events to be scheduled. For larger population sizes this large amount of events will really slow things down, and because in that case it is not even realistic that everyone can form a relationship with everyone else, a lower number of this ‘eyecap fraction’ (for which ‘blinders’ or ‘blinkers’ is a better name) will cause a person to be interested in fewer people.When each person is assigned the trivial location (0, 0), the people for such a limited set of interests are simply chosen at random. If a non-trivial 2D distribution is used (see the section about the geographical location of a person), the set of these interests will preferably be chosen closer to the location of the person in question. To do this, instead of a really accurate ordering of everyone based on their distance (which would become very slow for large populations), an approximate coarse grid is used instead (see below).
In case you want to disable relationship formation altogether, you can set this value to zero.
population.coarsemap.subdivx
(20):
As described above, in case the ‘eyecap’ setting is used, each person will have a set of interests assigned to them. For issues of speed, a coarse grid is used to get an approximation of the ordering by distance.To do so, a 2D grid is made that covers the region of the persons’ locations, and each person is assigned to a corresponding grid cell. To get an approximate ordering of other people with respect to a certain location, the grid cells themselves are ordered and people are selected based on this ordering to create the set of ‘interests’.
The value of this parameter describes the number of grid cells in the x-direction.
population.coarsemap.subdivy
(20):
Similar to the previous setting, the value of this parameter describes the number of grid cells in the y-direction.
population.msm
(‘no’):
Ifno
(the default), only heterosexual relationships will be possible. If set toyes
, MSM relationships will be possible as well.
Per person options¶
As explained before, a population is central to the Simpact Cyan simulation and such a population consists of persons, each of which can be a man or a woman. During the simulation, these persons have many properties: gender, age, the number of relationships, which partners, etc. Several properties of persons can be defined using configuration options, which are discussed in this section.
Various other settings¶
Here, we’ll discuss a few per-person settings which do not fall into the categories
above. The first one is called person.art.accept.threshold.dist.type
and is related
to how willing a person is to start treatment when offered. When a person is introduced
into the population, a number is picked from the specified distribution. This number
is fixed for this person, and will no longer change during the simulation. Then, when
the person is offered treatment, a new random number between 0 and 1 is chosen uniformly.
If this number is below the threshold value that was determined earlier, treatment
will be accepted, otherwise it is rejected. By default, the person.art.accept.threshold.dist.type
setting always sets the threshold at 0.5, causing each person to have a 50% chance of
accepting treatment when offered.
When a person is added to the population, a location is chosen for this person from
the two dimensional distribution that is specified in person.geo.dist2d.type
.
In the default Simpact Cyan simulation, this location is not yet used in any hazard,
and the default location is put to (0, 0) for everyone. Because the location is written
to the person log file, it can be (ab)used to test two dimensional
distributions, like we did in the example for the discrete two dimensional distribution.
By default, the survival time for a person after becoming HIV infected, is given by a simple relation based on the set-point viral load. Because an exact mapping from viral load to survival time is not that realistic, you can add some randomness to this relation using the distribution in person.survtime.logoffset.dist.type. When a person becomes infected, a random number is drawn from this distribution and will correspond to an offset in the survival time, as explained in the AIDS mortality event. The following IPython notebook illustrates the effect: survivaltimernd.ipynb.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
person.art.accept.threshold.dist.type
(‘fixed’ with value 0.5):
This specifies the one dimensional distribution that is used to choose the ART acceptance threshold for each person, as explained earlier.person.geo.dist2d.type
(‘fixed’ with value (0, 0)):
This two dimensional distribution is used to assign a geographic location to each person. In the main Simpact Cyan simulation, this is currently not used in any hazard.person.survtime.logoffset.dist.type
(‘fixed’ with value 0):
This one dimensional distribution can be used to add some randomness to the survival time until dying of AIDS related causes after becoming HIV infected.
Events¶
The simulation advances by figuring out which event should take place next, followed by executing code for that event. At the start, many initial events are typically scheduled, some set up to fire at a specific simulation time, some based on a hazard which may change during the simulation. During the simulation, new events will get scheduled, and some already scheduled events will be discarded (for example, in case someone dies, no other events involving this person will need to get executed anymore).
Below you can find an overview of the events that are currently used in the simulation. The relevant configuration options are mentioned as well.
AIDS mortality event¶
When a person gets infected with HIV, an HIV-based time of death is determined. This time of death is determined as the time of infection plus the survival time, which is given by the following formula (based on [Arnaout et al.]):
In this formula, \(C\) and \(k\) are parameters which can be configured using the settings
mortality.aids.survtime.C
and mortality.aids.survtime.k
respectively. The \(x\) parameter
is determined per person allowing some randomness in the formula: it
determines an offset on a logarithmic scale. By default, this value is zero however,
causing a very strict relationship between \(V_{\rm sp}\) and \(t_{\rm survival}\). The value of
\(V_{\rm sp}\) is the set-point viral load, first determined at the time of infection
and in general
different per person. The value of the set-point viral load can change when treatment is involved: when a
person is receiving treatment, the viral load will go down, causing him to live longer.
When a person drops out of treatment, the viral load goes up again and the expected
lifespan shrinks.
To illustrate how this is taken into account, consider a person that has an initial viral load that causes a survival time of 10 years. Suppose that after 1 year, treatment is started and that using the formula above the survival time would become 50 years. When treatment got started, 10% of the survival time had already passed and we take this into account. So after starting treatment, the AIDS related mortality would be scheduled after 45 years. If the person drops out of treatment 10 years later, 20% of the remaining survival time has passed, which translates to 2 years in terms of the original viral load. This means that still 7 years will remain until the AIDS based mortality event is fired. Note that using this approach, one will never encounter the situation where the time of death has already passed when increasing the viral load.
You can find an IPython notebook that illustrates this example here: aidsmortalityexample.ipynb
An AIDS based mortality event will be scheduled to fire at the specified time, which may still change as explained above. When it fires, the person is considered to have died from AIDS. Note that this does not influence the ‘normal’ mortality event, which can still get triggered sooner to allow for another cause of death.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
mortality.aids.survtime.C
(1325.0):
The value of \(C\) in the formula for \(t_{\rm survival}\).mortality.aids.survtime.k
(-0.49):
The value of \(k\) in the formula for \(t_{\rm survival}\).
AIDS stage event¶
When a person gets infected with HIV, he will first be in the acute phase of infection, then in the chronic stage, and after a while in the AIDS stage. The AIDS stage is actually split in two separate phases: an AIDS stage, and a final AIDS stage. In this last period, the person is deemed to be too ill to e.g. form sexual relationships.
The first AIDS stage gets scheduled when the chronic stage event fires,
and is scheduled to get triggered at a specific time (aidsstage.start) before the
AIDS related death takes place. When this event fires, another one
is scheduled to mark the transition to the final AIDS stage, also set to take place
a specific amount of time (aidsstage.final
) before the AIDS based death. Because the
time of the AIDS related death can still change when treatment is involved, these fire times
can also still change.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
aidsstage.start
(1.25):
The time before the AIDS related death a person will advance to the AIDS stage of infection. Defaults to 15 months.aidsstage.final
(0.5):
The time before the AIDS related death a person will advance to the final AIDS stage of infection. Defaults to 6 months.
Birth event¶
After a conception event is triggered, a new birth event will be scheduled,
so that the woman in the relationship will give birth to a new person a specific time
(based on birth.pregnancyduration.dist.type
) later. The gender is determined by the
birth.boygirlratio
configuration setting.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
birth.boygirlratio
(1.0/2.01):
The probability of the newly born person to be a man.birth.pregnancyduration.dist.type
(defaults to ‘fixed’ with a value of 268/365):
With this parameter you can specify the distribution to be used when determining how long the pregnacy should be, before firing the birth event. By default, the fixed value of 268/365 is used, but other distributions and related parameters can be used as well.
Check stopping criterion event¶
This event allows you to terminate a simulation if a certain population size
(checkstop.max.popsize
) or real-world elapsed time (checkstop.max.runtime
)
is exceeded. To enable this, the checkstop.interval
parameter must be
set to a positive value. If so, at regularly spaced times (simulation time)
this check will be performed.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
checkstop.interval
(-1):
To enable this event, set the value to a positive value. If enabled, this event will fire at simulation times that are multiples of this interval, at which time checks on the population size and/or running time are performed.checkstop.max.runtime
(inf):
When the event fires, the elapsed real-world time since the start of the simulation program will be compared to this value. If it exceeds it, the simulation will be aborted.checkstop.max.popsize
(inf):
When the event fires, the population size will be compared to this value. If it exceeds it, the simulation will be aborted.
Chronic AIDS stage event¶
When a person becomes HIV infected, he starts out in the acute stage of the disease.
This ‘chronic stage’ event is then scheduled to mark the transition from the acute
stage to the chronic stage, which will
fire a specific amount of time (chronicstage.acutestagetime
) later.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
chronicstage.acutestagetime
(0.25):
This is the duration of the acute HIV stage, before transitioning to the chronic stage. The default is three months.
Conception event¶
When a formation event has fired (so a man and a woman are in a sexual relationship), a conception event will be scheduled unless the woman is already pregnant. This is a hazard-based event, and its hazard at time \(t\) is defined as:
which is a time-dependent hazard of type
By default, only the \(\alpha_{\rm base}\) value is used (conception.alpha_base
), resulting in a constant
hazard, but other factors can be used as well: the age of the man and woman in the
relationship can be taken into account using conception.alpha_ageman
and
conception.alpha_agewoman
, the weekly sex frequency (WSF) using conception.alpha_wsf
and the
‘age’ of the relationship using conception.beta
(\(t_{\rm ref}\) is set to the time
the relationship started).
The value of \({\rm WSF}\) itself is currently chosen from the distribution specified
in conception.wsf.dist.type
, at the time the event gets scheduled.
When a conception event fires, so when actual conception takes place, a birth event will be scheduled.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
conception.alpha_base
(-3):
The value of \(\alpha_{\rm base}\) in the formula for the hazard.conception.alpha_ageman
(0):
The value of \(\alpha_{\rm age,man}\) in the formula for the hazard, to be able to take the age of the man in the relationship into account.conception.alpha_agewoman
(0):
The value of \(\alpha_{\rm age,woman}\) in the formula for the hazard, to be able to take the age of the woman in the relationship into account.conception.alpha_wsf
(0):
The value of \(\alpha_{\rm wsf}\) in the formula to the hazard. This way you can take a value for the weekly sex frequency (WSF) into account.conception.beta
(0):
The value of \(\beta\) in the hazard formula, allowing you to influence the hazard based on the ‘age’ of the relationship.conception.t_max
(200):
As explained in the section about ‘time limited’ hazards, an exponential function needs some kind of threshold value (after which it stays constant) to be able to perform the necessary calculations. This configuration value is a measure of this threshold.conception.wsf.dist.type
(‘fixed’, with value 0):
When the conception event is scheduled, a value for the weekly sex frequency (WSF) to use in the hazard formula is picked from a distribution. This configuration option specifies which distribution you would like to use, and depending on the value other parameters for the distribution can be configured as well.
Debut event¶
Persons who are not yet sexually active will have a debut event scheduled, which
will fire when a person has reached a specified age (debut.debutage
). When
this event fires, the person becomes sexually active and relationship formation events
will get scheduled. The number of formation events that gets scheduled can
be controlled using the ‘eyecap’ setting.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
debut.debutage
(15):
The age a person must have to become sexually active. This determines when the debut event for a particular person will get executed.
Diagnosis event¶
When a person gets infected with HIV, either by transmission of the virus or by seeding the population to get the epidemic started, a diagnosis event will get scheduled. When fired, the person is deemed to feel bad enough to go to a doctor and get diagnosed as being HIV-infected. Upon diagnosis, a monitoring event will be scheduled very shortly afterwards, to monitor the progression of the disease and to offer treatment if eligible.
This event is hazard-based, and the hazard is of the following form:
Note that this is again a time dependent exponential hazard of the form
In the formula, \(G\) is a value related to the gender of the person, 0 for a man and 1 for a woman. The number \(P\) represents the number of partners of the person that are both HIV infected and diagnosed. The value of \(D\) is an indication of whether the person was diagnosed previously: its value is 0 if this is the initial diagnosis event, or 1 if it’s a re-diagnosis (after dropping out of treatment). The value of \(HSV2\) is an indication of whether the person is infected with HSV2: its value is 0 if the person is not infected with HSV2 and 1 if the person is infected with HSV2.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
diagnosis.baseline
(0):
Controls the corresponding \({\rm baseline}\) value in the expression for the hazard.diagnosis.agefactor
(0):
Controls the corresponding \({\rm agefactor}\) value in the expression for the hazard. This allows one to let the age of a person influence the hazard.diagnosis.genderfactor
(0):
Controls the \({\rm genderfactor}\) parameter in the hazard. This allows you to have a different hazard depending on the gender of the person.diagnosis.diagpartnersfactor
(0):
Corresponds to the value of \({\rm diagpartnersfactor}\) in the expression for the hazard. The idea is to allow the number of partners that have already been diagnosed to have an effect on a person’s diagnosis time: if a person is not feeling well and knows that some of the partners are infected with HIV, this can be an incentive to go to the doctor sooner.diagnosis.isdiagnosedfactor
(0):
Using this \({\rm isdiagnosedfactor}\) value in the hazard, it is possible to have a different hazard if the person was diagnosed before. After dropping out of treatment, for example because a person is feeling better and no longer feels the need for treatment, a diagnosis event will be scheduled again. It is reasonable to think that a person may go to the doctor again sooner when he already knows about the HIV infection.diagnosis.beta
(0):
Corresponds to the \({\beta}\) factor in the hazard expression, allowing one to take the time since infection into account.diagnosis.HSV2factor
(0):
Using the \({\rm HSV2factor}\), it is possible to have a different hazard when the person is infected with HSV2.diagnosis.t_max
(200):
As explained in the section about ‘time limited’ hazards, an exponential function needs some kind of threshold value (after which it stays constant) to be able to perform the necessary calculations. This configuration value is a measure of this threshold.
Dissolution event¶
As soon as a relationship is formed a dissolution event gets scheduled to allow for the possibility that the relationship terminates. The hazard for this event is the following:
Note that this is again a time dependent exponential hazard of the form
In this expression, \(P_{\rm man}\) and \(P_{\rm woman}\) are the number of partners the man and woman in the relationship have. The value \(D_{\rm pref}\) represents the preferred age difference between a man and a woman. The value of \(t_{\rm ref}\) is the time at which the relationship was formed.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
dissolution.alpha_0
(0.1):
The value of \(\alpha_0\) in the expression for the hazard, allowing one to establish a baseline value.dissolution.alpha_1
(0):
The value of \(\alpha_1\) in the hazard formula, corresponding to a weight for the number of relationships the man in the relationship has.dissolution.alpha_2
(0):
The value of \(\alpha_2\) in the hazard formula, corresponding to a weight for the number of relationships the woman in the relationship has.dissolution.alpha_3
(0):
The value of \(\alpha_3\) in the hazard expression, by which the influence of the difference in number of partners can be specified.dissolution.alpha_4
(0):
The value of \(\alpha_4\) in the expression for the hazard, a weight for the average age of the partners.dissolution.alpha_5
(0):
The factor \(\alpha_5\) controls the relative importance of how much the age gap between man and woman differs from the preferred age difference \(D_{\rm pref}\).dissolution.Dp
(0):
This configures the preferred age difference \(D_{\rm pref}\) in the hazard expression. Note that to take this into account, \(\alpha_5\) should also be set to a non-zero value.dissolution.beta
(0):
As can be seen in the expression for the hazard, using this value the ‘age’ of the relationship can be taken into account.dissolution.t_max
(200):
As explained in the section about ‘time limited’ hazards, an exponential function needs some kind of threshold value (after which it stays constant) to be able to perform the necessary calculations. This configuration value is a measure of this threshold.
MSM Dissolution event¶
As soon as an MSM relationship is formed an MSM dissolution event gets scheduled to allow for the possibility that the relationship terminates. The hazard for this event is the following:
Note that this is again a time dependent exponential hazard of the form
In this expression, \(P_{\rm man1}\) and \(P_{\rm man2}\) are the number of partners the men in the relationship have. The value of \(t_{\rm ref}\) is the time at which the relationship was formed.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
dissolutionmsm.alpha_0
(0.1):
The value of \(\alpha_0\) in the expression for the hazard, allowing one to establish a baseline value.dissolutionmsm.alpha_12
(0):
The value of \(\alpha_{12}\) in the hazard formula, corresponding to a weight for the number of relationships the men in the relationship have.dissolutionmsm.alpha_3
(0):
The value of \(\alpha_3\) in the hazard expression, by which the influence of the difference in number of partners can be specified.dissolutionmsm.alpha_4
(0):
The value of \(\alpha_4\) in the expression for the hazard, a weight for the average age of the partners.dissolutionmsm.alpha_5
(0):
The factor \(\alpha_5\) controls the relative importance of the age gap between the men in the relationship.dissolutionmsm.beta
(0):
As can be seen in the expression for the hazard, using this value the ‘age’ of the relationship can be taken into account.dissolutionmsm.t_max
(200):
As explained in the section about ‘time limited’ hazards, an exponential function needs some kind of threshold value (after which it stays constant) to be able to perform the necessary calculations. This configuration value is a measure of this threshold.
ART treatment dropout event¶
When a monitoring event gets triggered and the person is both eligible and willing to receive treatment, treatment is started causing the set-point viral load of the person to be lowered. When treatment starts, a dropout event is scheduled as well, to allow a person to drop out of treatment.
Currently, the dropout event is not hazard-based, instead a random number is
picked from a one dimensional probability distribution as specified in
dropout.interval.dist.type
and related configuration options.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
dropout.interval.dist.type
(‘uniform’ by default, with boundaries of 3 months and 10 years):
Using this configuration option you can specify the probability distribution to use when obtaining the time after which a person will drop out of treatment. By default, this is a uniform distribution with equal non-zero probability between 3 months and 10 years, and zero otherwise. Other distributions can be specified as well, as explained previously.
Formation event¶
Depending on the ‘eyecap’ setting, for a number of man/woman pairs, formation events will be scheduled. When such an event fires, a relationship between these two persons will be formed. Apart from scheduling a dissolution event, a conception event will get scheduled if the woman in the relationship is not yet pregnant, and in case just one of the partners is infected with HIV, a transmission event will be scheduled as well.
The formation event is hazard based, and there are currently three hazard types
that can be used by configuring formation.hazard.type
. The first hazard type
is called the ‘simple’ hazard, and is nearly equal to the hazard of the
dissolution event.
The second hazard type, called ‘agegap’, is more advanced. Not only can the
preferred age gap differ from one person to the next, but there’s also an
age dependent component in this preferred age gap. The third hazard, ‘agegapry’
allows for the weight of the agegap terms to be age dependent. To make this possible,
the time dependency in the age gap part of the hazard is only approximate: times
will refer to a reference year (hence the ‘ry’ in the hazard name) which can be
set using the ‘synchronize reference year’ event.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
formation.hazard.type
(agegap
):
This parameter specifies which formation hazard will be used. Allowed values aresimple
,agegap
andagegapry
.
The simple
formation hazard¶
The hazard for the simple
formation event is shown further on and is nearly identical
to the one of the dissolution event. Apart from a few extra terms
in the expression for the hazard, the most important difference
is the factor \(F\) in front.
This factor \(F\) is a normalization factor which takes the population size (or more accurately the size when viewed through the the ‘eyecaps’) into account. This is necessary because the number of formation events that get scheduled is proportional to the population size (for a fixed ‘eyecap’ fraction). So if no normalization factor would be used, a larger population would automatically mean that more relationships are formed. By roughly dividing the hazard by the population size, this increase in available possible relationships when the population size is larger, does not automatically result in more relationships anymore.
To be very accurate, each increase or decrease in the population size (by a birth or death of a person) should be taken into account, and to do so all formation event fire times would need to be recalculated according to the changed (because \(F\) changed) hazard. This would become a huge bottleneck, especially when the population size is large, and birth and mortality events occur frequently.
To work around this bottleneck, it is not the true population size that is used in this normalization factor, but the last known population size. This last known size can be updated by the event that synchronizes population statistics, at which time all formation event fire times will be recalculated. If the population size stays roughly constant, this is not necessary, but it will be if the population size grows or shrinks considerably during the simulation.
The hazard for this formation type is the following:
Note that this is again a time dependent exponential hazard of the form
In this expression, \(P_{\rm man}\) and \(P_{\rm woman}\) are the number of partners the man and woman in the relationship have. The value \(D_{\rm pref}\) represents the preferred age difference between a man and a woman, and \(E_{\rm man}\) and \(E_{\rm woman}\) are parameters that can be different for each person describing their eagerness of forming a relationship. The distance between the locations \(\vec{R}_{\rm man}\) and \(\vec{R}_{\rm woman}\) of the partners involved can be taken into account as well.
The value of \(t_{\rm ref}\) is the time at which the relationship between the two persons became possible. If no relationship existed between the two people earlier, this is the time at which the youngest person reached the debut age. On the other hand, if a relationship between these partners did exist before, it is the time at which that relationship got dissolved. The factor \(F\) is the normalization factor discussed earlier.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
formation.hazard.simple.alpha_0
(0.1):
The value of \(\alpha_0\) in the expression for the hazard, allowing one to establish a baseline value.formation.hazard.simple.alpha_1
(0):
The value of \(\alpha_1\) in the hazard formula, corresponding to a weight for the number of relationships the man in the relationship has.formation.hazard.simple.alpha_2
(0):
The value of \(\alpha_2\) in the hazard formula, corresponding to a weight for the number of relationships the woman in the relationship has.formation.hazard.simple.alpha_3
(0):
The value of \(\alpha_3\) in the hazard expression, by which the influence of the difference in number of partners can be specified.formation.hazard.simple.alpha_4
(0):
The value of \(\alpha_4\) in the expression for the hazard, a weight for the average age of the partners.formation.hazard.simple.alpha_5
(0):
The factor \(\alpha_5\) controls the relative importance of how much the age gap between man and woman differs from the preferred age difference \(D_{\rm pref}\).formation.hazard.simple.alpha_6
(0):
Weight for the sum of the eagerness parameters of both partners.formation.hazard.simple.alpha_7
(0):
Weight for the difference of the eagerness parameters of both partners.formation.hazard.simple.alpha_dist
(0):
This configures the weight \(\alpha_{\rm dist}\) of the geographical distance between the partners.formation.hazard.simple.Dp
(0):
This configures the preferred age difference \(D_{\rm pref}\) in the hazard expression. Note that to take this into account, \(\alpha_5\) should also be set to a non-zero value.formation.hazard.simple.beta
(0):
Corresponds to \(\beta\) in the hazard expression and allows you to take the time since the relationship became possible into account.formation.hazard.simple.t_max
(200):
As explained in the section about ‘time limited’ hazards, an exponential function needs some kind of threshold value (after which it stays constant) to be able to perform the necessary calculations. This configuration value is a measure of this threshold.
The agegap
formation hazard¶
The agegap
formation hazard is more complex than the previous hazard, providing
some additional functionality. With this hazard it’s possible to simulate the
previous simple
hazard, but not the other way around. The general look of the
hazard is the same as before, but with some important differences:
In this equation the following notation is used for clarity:
i.e., \(A(t)\) represents the age of someone. As you can see from the expression, it is now possible to specify a preferred age difference on a per-person basis. This personal preferred age difference \(D_{p,{\rm man}}\) or \(D_{p,{\rm woman}}\) can be controlled by specifying a one dimensional probability distribution, as explained in the person settings. Apart from more variation in the age gap, the preferred age gap for a man or a woman can also vary in time, based on the age of one of the partners. The importance of such a change can be controlled using the \(\alpha_{\rm gap,agescale,man}\) and \(\alpha_{\rm gap,agescale,woman}\) parameters.
In front of the hazard, there is again a factor \(F\), just like with the simple
hazard. As explained there, it serves as a normalization factor
and for a population which can change much in size, an event that synchronizes population statistics
may be important to schedule regularly.
The values \(P_{\rm man}\) and \(P_{\rm woman}\) are the number of partners
the man and woman in the relationship have, and \(E_{\rm man}\) and
\(E_{\rm woman}\) are parameters that can be different for each person describing
their eagerness of forming a relationship.
The distance between
the locations \(\vec{R}_{\rm man}\) and \(\vec{R}_{\rm woman}\) of the partners
involved can be taken into account as well.
As with the simple
hazard, the value of \(t_{\rm ref}\) is the time
at which the relationship between the two persons became possible. If no relationship
existed between the two people earlier, this is the time at which the youngest person
reached the debut age. On the other hand, if a relationship between
these partners did exist before, it is the time at which that relationship got
dissolved.
Calculating the mapping from internal time to real-world time for this hazard
can no longer be done using the exponential time dependent hazard we encountered
e.g. in the simple
hazard. The reason is the time dependence that is now present
inside the terms with the absolute values. To see how the mapping is done in this
case, you can look at the calculations in this document:
age gap hazard calculations
The following IPython notebook provides some simple examples of this agegap
hazard: agegap_hazard_examples.ipynb.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
formation.hazard.agegap.baseline
(0.1):
The value of \(\alpha_{\rm baseline}\) in the expression for the hazard, allowing one to establish a baseline value.formation.hazard.agegap.numrel_man
(0):
The value of \(\alpha_{\rm numrel,man}\) in the hazard formula, corresponding to a weight for the number of relationships the man in the relationship has.formation.hazard.agegap.numrel_woman
(0):
The value of \(\alpha_{\rm numrel,woman}\) in the hazard formula, corresponding to a weight for the number of relationships the woman in the relationship has.formation.hazard.agegap.numrel_diff
(0):
The value of \(\alpha_{\rm numrel,diff}\) in the hazard expression, by which the influence of the difference in number of partners can be specified.formation.hazard.agegap.meanage
(0):
The value of \(\alpha_{\rm meanage}\) in the expression for the hazard, a weight for the average age of the partners.formation.hazard.agegap.eagerness_sum
(0):
Weight \(\alpha_{\rm eagerness,sum}\) for the sum of the eagerness parameters of both partners.formation.hazard.agegap.eagerness_diff
(0):
Weight \(\alpha_{\rm eagerness,diff}\) for the difference of the eagerness parameters of both partners.formation.hazard.agegap.gap_factor_man
(0):
With this setting you set \(\alpha_{\rm gap,factor,man}\), specifying the influence of the age gap from the man’s point of view.formation.hazard.agegap.gap_agescale_man
(0):
This controls \(\alpha_{\rm gap,agescale,man}\), which allows you to vary the preferred age gap with the age of the man in the relationship.formation.hazard.agegap.gap_factor_woman
(0):
With this setting you set \(\alpha_{\rm gap,factor,woman}\), specifying the influence of the age gap from the man’s point of view.formation.hazard.agegap.gap_agescale_woman
(0):
This controls \(\alpha_{\rm gap,agescale,woman}\), which allows you to vary the preferred age gap with the age of the woman in the relationship.formation.hazard.agegap.distance
(0):
This configures the weight \(\alpha_{\rm dist}\) of the geographical distance between the partners.formation.hazard.agegap.beta
(0):
Corresponds to \(\beta\) in the hazard expression and allows you to take the time since the relationship became possible into account.formation.hazard.agegap.t_max
(200):
Even though this hazard is no longer a simple time dependent exponential, it will still be necessary to provide some kind of cut-off, as explained in the section about ‘time limited’ hazards. This configuration value is a measure of this threshold.
The agegapry
formation hazard¶
While the ‘agegap’ hazard already allows for a great deal of
flexibility, it is not possible to vary the importance of the age gap terms as
people get older. The following hazard is very similar to the agegap
one,
but this does allow the importance of the age gap to change over time:
In this equation, the terms \(G_{\rm man}\) and \(G_{\rm woman}\) for the age gap between partners in a relationship is given by the following expressions:
where \(g_{\rm man}(t_{\rm ry})\) and \(g_{\rm woman}(t_{\rm ry})\) specify the preferred age gaps themselves, which can change over time:
In these equations again the following notation is used:
i.e., \(A(t)\) represents the age of someone.
Looking at these full age gap terms \(G_{\rm man}\) and \(G_{\rm woman}\), you can see
that they are similar to the ones from
the agegap
hazard, but the prefactor is no longer a simple configurable constant.
By tuning several parameters, the importance of these age gap terms can now be made
age-dependent.
However, this age-dependence is in fact only approximate because \(t_{\rm ry}\) is used in these expressions instead of the simulation time \(t\): to reduce the complexity of the hazard and keep the performance up, inside the age gap terms, strict time depencency on \(t\) is replaced by approximate time dependency on a reference year \(t_{\rm ry}\). By only changing this reference time at certain intervals (see the reference year synchronization event), the time dependency of the hazard becomes much more straightforward. Note that certain terms still have a dependency on the simulation time \(t\), causing this hazard to be of the form \(\exp(A + Bt)\).
By setting \(\alpha_{\rm numrel,scale,man}\) or \(\alpha_{\rm numrel,scale,woman}\), using the same approximation the importance of the number of partners can be made dependent on the age gap. The meaning of the other quantities in the hazard is the same as in the ‘agegap’ hazard.
An IPython notebook that illustrates how a funnel-like distribution of the
formed relationships can be generated using this agegapry
hazard, can be found
here: agegapry_hazard_funnel.ipynb.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
formation.hazard.agegapry.baseline
(0.1):
The value of \(\alpha_{\rm baseline}\) in the expression for the hazard, allowing one to establish a baseline value.formation.hazard.agegapry.numrel_man
(0):
The value of \(\alpha_{\rm numrel,man}\) in the hazard formula, corresponding to a weight for the number of relationships the man in the relationship has.formation.hazard.agegapry.numrel_scale_man
(0):
The value of \(\alpha_{\rm numrel,scale,man}\) in the formula for the hazard.formation.hazard.agegapry.numrel_woman
(0):
The value of \(\alpha_{\rm numrel,woman}\) in the hazard formula, corresponding to a weight for the number of relationships the woman in the relationship has.formation.hazard.agegapry.numrel_scale_woman
(0):
The value of \(\alpha_{\rm numrel,scale,woman}\) in the formula for the hazard.formation.hazard.agegapry.numrel_diff
(0):
The value of \(\alpha_{\rm numrel,diff}\) in the hazard expression, by which the influence of the difference in number of partners can be specified.formation.hazard.agegapry.meanage
(0):
The value of \(\alpha_{\rm meanage}\) in the expression for the hazard, a weight for the average age of the partners.formation.hazard.agegapry.eagerness_sum
(0):
Weight \(\alpha_{\rm eagerness,sum}\) for the sum of the eagerness parameters of both partners.formation.hazard.agegapry.eagerness_diff
(0):
Weight \(\alpha_{\rm eagerness,diff}\) for the difference of the eagerness parameters of both partners.formation.hazard.agegapry.gap_factor_man_const
(0):
The value of \(\alpha_{\rm gap,factor,man,const}\) in the age gap term \(G_{\rm man}(t_{\rm ry})\).formation.hazard.agegapry.gap_factor_man_exp
(0):
The value of \(\alpha_{\rm gap,factor,man,exp}\) in the age gap term \(G_{\rm man}(t_{\rm ry})\).formation.hazard.agegapry.gap_factor_man_age
(0):
The value of \(\alpha_{\rm gap,factor,man,age}\) in the age gap term \(G_{\rm man}(t_{\rm ry})\).formation.hazard.agegapry.gap_agescale_man
(0):
This controls \(\alpha_{\rm gap,agescale,man}\), which allows you to vary the preferred age gap with the age of the man in the relationship.formation.hazard.agegapry.gap_factor_woman_const
(0):
The value of \(\alpha_{\rm gap,factor,woman,const}\) in the age gap term \(G_{\rm woman}(t_{\rm ry})\).formation.hazard.agegapry.gap_factor_woman_age
(0):
The value of \(\alpha_{\rm gap,factor,woman,age}\) in the age gap term \(G_{\rm woman}(t_{\rm ry})\).formation.hazard.agegapry.gap_factor_woman_exp
(0):
The value of \(\alpha_{\rm gap,factor,woman,exp}\) in the age gap term \(G_{\rm woman}(t_{\rm ry})\).formation.hazard.agegapry.gap_agescale_woman
(0):
This controls \(\alpha_{\rm gap,agescale,woman}\), which allows you to vary the preferred age gap with the age of the woman in the relationship.formation.hazard.agegapry.distance
(0):
This configures the weight \(\alpha_{\rm dist}\) of the geographical distance between the partners.formation.hazard.agegapry.beta
(0):
Corresponds to \(\beta\) in the hazard expression and allows you to take the time since the relationship became possible into account.formation.hazard.agegapry.t_max
(200):
As explained in the section about ‘time limited’ hazards, an exponential function needs some kind of threshold value (after which it stays constant) to be able to perform the necessary calculations. This configuration value is a measure of this threshold.formation.hazard.agegapry.maxageref.diff
(1):
As explained above, the age gap terms do not use the real time dependency \(t\), but refer to a reference time \(t_{\rm ry}\) that needs to be synchronized periodically using the synchronize reference year event. The program will abort if it detects that the last reference time synchronization was more than this amount of time ago, which by default is one year.
MSM Formation event¶
The MSM formation event is very similar to the heterosexual formation event. If MSM relationships are enabled in the simulation, depending on the ‘eyecap’ setting, MSM formation events will be scheduled for a number of man/man pairs. When such an event fires, a relationship between these two men will be formed. Apart from scheduling an MSM dissolution event, in case just one of the partners is infected with HIV, a transmission event will be scheduled as well.
Just like the heterosexual relationship formation event, this event is also hazard
based and uses the same hazard types, which can be configured using formationmsm.hazard.type
:
there are the ‘simple’ hazard, the ‘agegap’ hazard
and the ‘agegapry’ hazard.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
formationmsm.hazard.type
(agegap
):
This parameter specifies which formation hazard will be used. Allowed values aresimple
,agegap
andagegapry
.
The simple
formation hazard (MSM version)¶
The hazard for the simple
formation event is shown below and is nearly identical
to the one of the MSM dissolution event. Apart from a few extra terms
in the expression for the hazard, the most important difference
is the factor \(F\) in front, a normalization factor which has the same meaning as in
the corresponding hazard from the heterosexual relationship formation
event.
The hazard for this formation type is the following:
Note that this is again a time dependent exponential hazard of the form
In this expression, \(P_{\rm man1}\) and \(P_{\rm man2}\) are the number of partners of the men in the relationship, and \(E_{\rm man1}\) and \(E_{\rm man2}\) are parameters that can be different for each person describing their eagerness of forming a relationship. The distance between the locations \(\vec{R}_{\rm man1}\) and \(\vec{R}_{\rm man2}\) of the partners involved can be taken into account as well.
The value of \(t_{\rm ref}\) is the time at which the relationship between the two persons became possible. If no relationship existed between the two people earlier, this is the time at which the youngest person reached the debut age. On the other hand, if a relationship between these partners did exist before, it is the time at which that relationship got dissolved.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
formationmsm.hazard.simple.alpha_0
(0.1):
The value of \(\alpha_0\) in the expression for the hazard, allowing one to establish a baseline value.formationmsm.hazard.simple.alpha_12
(0):
The value of \(\alpha_{12}\) in the hazard formula, corresponding to a weight for the number of relationships the men in the relationship have.formationmsm.hazard.simple.alpha_3
(0):
The value of \(\alpha_3\) in the hazard expression, by which the influence of the difference in number of partners can be specified.formationmsm.hazard.simple.alpha_4
(0):
The value of \(\alpha_4\) in the expression for the hazard, a weight for the average age of the partners.formationmsm.hazard.simple.alpha_5
(0):
The factor \(\alpha_5\) controls the relative importance of the age gap between the partners.formationmsm.hazard.simple.alpha_6
(0):
Weight for the sum of the eagerness parameters of both partners.formationmsm.hazard.simple.alpha_7
(0):
Weight for the difference of the eagerness parameters of both partners.formationmsm.hazard.simple.alpha_dist
(0):
This configures the weight \(\alpha_{\rm dist}\) of the geographical distance between the partners.formationmsm.hazard.simple.beta
(0):
Corresponds to \(\beta\) in the hazard expression and allows you to take the time since the relationship became possible into account.formationmsm.hazard.simple.t_max
(200):
As explained in the section about ‘time limited’ hazards, an exponential function needs some kind of threshold value (after which it stays constant) to be able to perform the necessary calculations. This configuration value is a measure of this threshold.
The agegap
formation hazard (MSM version)¶
The agegap
formation hazard is again very similar to its heterosexual counterpart:
In this equation the following notation is used for clarity:
i.e., \(A(t)\) represents the age of someone. As you can see from the expression, it is now possible to specify a preferred age difference on a per-person basis. This personal preferred age difference \(D_{p,{\rm man1}}\) or \(D_{p,{\rm man2}}\) can be controlled by specifying a one dimensional probability distribution, as explained in the person settings. Note that the preferred age gaps are the MSM specific agegaps. Apart from more variation in these age gaps, the preferred age gap can also vary in time, based on the age of one of the partners. The importance of such a change can be controlled using the \(\alpha_{\rm gap,agescale}\) parameter.
In front of the hazard, there is again a factor \(F\), just like with the simple hazard. The values \(P_{\rm man1}\) and \(P_{\rm man2}\) are the number of partners the men in the relationship have, and \(E_{\rm man1}\) and \(E_{\rm man2}\) are parameters that can be different for each person describing their eagerness of forming a relationship. Note that the eagerness values used here are the MSM eagerness values. The distance between the locations \(\vec{R}_{\rm man1}\) and \(\vec{R}_{\rm man2}\) of the partners involved can be taken into account as well.
As with the simple
hazard, the value of \(t_{\rm ref}\) is the time
at which the relationship between the two persons became possible. If no relationship
existed between the two people earlier, this is the time at which the youngest person
reached the debut age. On the other hand, if a relationship between
these partners did exist before, it is the time at which that relationship got
dissolved.
Calculating the mapping from internal time to real-world time for this hazard
can no longer be done using the exponential time dependent hazard we encountered
e.g. in the simple
hazard. The reason is the time dependence that is now present
inside the terms with the absolute values. More information can be found in the
documentation for the heterosexual ‘agegap’ hazard
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
formationmsm.hazard.agegap.baseline
(0.1):
The value of \(\alpha_{\rm baseline}\) in the expression for the hazard, allowing one to establish a baseline value.formationmsm.hazard.agegap.numrel_sum
(0):
The value of \(\alpha_{\rm numrel,sum}\) in the hazard formula, corresponding to a weight for the number of relationships the men in the relationship have.formationmsm.hazard.agegap.numrel_diff
(0):
The value of \(\alpha_{\rm numrel,diff}\) in the hazard expression, by which the influence of the difference in number of partners can be specified.formationmsm.hazard.agegap.meanage
(0):
The value of \(\alpha_{\rm meanage}\) in the expression for the hazard, a weight for the average age of the partners.formationmsm.hazard.agegap.eagerness_sum
(0):
Weight \(\alpha_{\rm eagerness,sum}\) for the sum of the eagerness parameters of both partners. Note that the relevant value here is the MSM eagerness.formationmsm.hazard.agegap.eagerness_diff
(0):
Weight \(\alpha_{\rm eagerness,diff}\) for the difference of the eagerness parameters of both partners. Note that the relevant value here is the MSM eagerness.formationmsm.hazard.agegap.gap_factor
(0):
This corresponds to \(\alpha_{\rm gap,factor}\), the weight for the age gap terms.formationmsm.hazard.agegap.gap_agescale
(0):
This corresponds to \(\alpha_{\rm gap,agescale}\), by which you can control how the preferred age gap changes over time.formationmsm.hazard.agegap.distance
(0):
This configures the weight \(\alpha_{\rm dist}\) of the geographical distance between the partners.formationmsm.hazard.agegap.beta
(0):
Corresponds to \(\beta\) in the hazard expression and allows you to take the time since the relationship became possible into account.formationmsm.hazard.agegap.t_max
(200):
Even though this hazard is no longer a simple time dependent exponential, it will still be necessary to provide some kind of cut-off, as explained in the section about ‘time limited’ hazards. This configuration value is a measure of this threshold.
The agegapry
formation hazard (MSM version)¶
The agegapry
formation hazard is again very similar to its heterosexual counterpart:
In this equation, the terms \(G_{\rm man1}\) and \(G_{\rm man2}\) for the age gap between partners in a relationship is given by the following expressions:
where \(g_{\rm man1}(t_{\rm ry})\) and \(g_{\rm man2}(t_{\rm ry})\) specify the preferred age gaps themselves, which can change over time:
In these equations again the following notation is used:
i.e., \(A(t)\) represents the age of someone. As with the heterosexual ‘agegapry’ hazard, the age gap prefactor is no longer a simple configurable constant: by tuning several parameters, the importance of these age gap terms can now be made age-dependent.
However, this age-dependence is again approximate because \(t_{\rm ry}\) is used in these expressions instead of the simulation time \(t\): see the documentation of its counterpart for more info. By setting \(\alpha_{\rm numrel,scale}\) using the same approximation the importance of the number of partners can be made dependent on the age gap. The meaning of the other quantities in the hazard is the same as in the ‘agegap’ hazard.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
formationmsm.hazard.agegapry.baseline
(0.1):
The value of \(\alpha_{\rm baseline}\) in the expression for the hazard, allowing one to establish a baseline value.formationmsm.hazard.agegapry.numrel_sum
(0):
The value of \(\alpha_{\rm numrel,sum}\) in the hazard formula, corresponding to a weight for the number of relationships the men in the relationship have.formationmsm.hazard.agegapry.numrel_scale
(0):
The value of \(\alpha_{\rm numrel,scale}\) in the formula for the hazard.formationmsm.hazard.agegapry.numrel_diff
(0):
The value of \(\alpha_{\rm numrel,diff}\) in the hazard expression, by which the influence of the difference in number of partners can be specified.formationmsm.hazard.agegapry.meanage
(0):
The value of \(\alpha_{\rm meanage}\) in the expression for the hazard, a weight for the average age of the partners.formationmsm.hazard.agegapry.eagerness_sum
(0):
Weight \(\alpha_{\rm eagerness,sum}\) for the sum of the eagerness parameters of both partners. Note that the relevant value here is the MSM eagerness.formationmsm.hazard.agegapry.eagerness_diff
(0):
Weight \(\alpha_{\rm eagerness,diff}\) for the difference of the eagerness parameters of both partners. Note that the relevant value here is the MSM eagerness.formationmsm.hazard.agegapry.gap_factor_const
(0):
The value of \(\alpha_{\rm gap,factor,const}\) in the age gap term \(G_{\rm manX}(t_{\rm ry})\).formationmsm.hazard.agegapry.gap_factor_exp
(0):
The value of \(\alpha_{\rm gap,factor,exp}\) in the age gap term \(G_{\rm manX}(t_{\rm ry})\).formationmsm.hazard.agegapry.gap_factor_age
(0):
The value of \(\alpha_{\rm gap,factor,age}\) in the age gap term \(G_{\rm manX}(t_{\rm ry})\).formationmsm.hazard.agegapry.gap_agescale
(0):
This controls \(\alpha_{\rm gap,agescale}\), which allows you to vary the preferred age gap with the age of the men in the relationship.formationmsm.hazard.agegapry.distance
(0):
This configures the weight \(\alpha_{\rm dist}\) of the geographical distance between the partners.formationmsm.hazard.agegapry.beta
(0):
Corresponds to \(\beta\) in the hazard expression and allows you to take the time since the relationship became possible into account.formationmsm.hazard.agegapry.t_max
(200):
As explained in the section about ‘time limited’ hazards, an exponential function needs some kind of threshold value (after which it stays constant) to be able to perform the necessary calculations. This configuration value is a measure of this threshold.formationmsm.hazard.agegapry.maxageref.diff
(1):
As explained before, the age gap terms do not use the real time dependency \(t\), but refer to a reference time \(t_{\rm ry}\) that needs to be synchronized periodically using the synchronize reference year event. The program will abort if it detects that the last reference time synchronization was more than this amount of time ago, which by default is one year.
HIV seeding event¶
When introducing the initial population in the simulation, the persons are not
yet infected with HIV. To start the infection, an HIV seeding event is scheduled,
and when (controlled by hivseed.time
) this is triggered, a certain amount of
people will be marked as HIV infected.
To do so, only the people that
have the right age (as specified by hivseed.age.min
and hivseed.age.max
)
and right gender (as specified by hivseed.gender
)
will be possible ‘seeders’, and depending on the setting of hivseed.type
either a fixed number of people will be used, or each person will have a
certain probability of being a seeder. In case a fixed number is requested
but this number cannot be reached in the current simulation, the program
can be instructed to terminate depending on the hivseed.stop.short
setting.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
hivseed.time
(0):
This specifies the time at which the seeding event takes place; default is at the start of the simulation. Set to a negative value to disable HIV seeding.hivseed.type
(‘fraction’):
This value specifies how the seeding will occur, eitherfraction
to specify that each person in the possible seeders group will have a certain probability of being a seeder, oramount
to specify that a fixed number of seeders should be chosen.hivseed.age.min
(0):
People who are possible seeders must be at least this old.hivseed.age.max
(1000):
People who are possible seeders must be at most this old.hivseed.gender
(‘any’):
People who are possible seeders must have this gender. Can be eitherany
(the default),male
orfemale
.hivseed.fraction
(0.2):
This is only used ifhivseed.type
is set tofraction
, and specifies the probability each possible seeder has of actually becoming HIV infected.hivseed.amount
(1):
Ifhivseed.type
isamount
, this number of people will be chosen from the group of possible seeders and marked as being HIV infected.hivseed.stop.short
(‘yes’):
In case a specific amount of seeders should be chosen but this amount is not available (e.g. due to a too restrictive allowed age range), the program will terminate if this is set toyes
. It will continue despite not having the requested amount of seeders if set tono
.
HSV2 seeding event¶
When introducing the initial population in the simulation, the persons are not
infected with HSV2. To start such an infection, an HSV2 seeding event is scheduled,
and when (controlled by hsv2seed.time
) this is triggered, a certain amount of
people will be marked as HSV2 infected. They can then pass on their infection
status through the HSV2 transmission event.
If the event fires, only the people that
have the right age (as specified by hsv2seed.age.min
and hsv2seed.age.max
)
and right gender (as specified by hsv2seed.gender
)
will be possible ‘seeders’, and depending on the setting of hsv2seed.type
either a fixed number of people will be used, or each person will have a
certain probability of being a seeder. In case a fixed number is requested
but this number cannot be reached in the current simulation, the program
can be instructed to terminate depending on the hsv2seed.stop.short
setting.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
hsv2seed.time
(-1):
This specifies the time at which the seeding event takes place; default is that HSV2 seeding is disabled.hsv2seed.type
(‘fraction’):
This value specifies how the seeding will occur, eitherfraction
to specify that each person in the possible seeders group will have a certain probability of being a seeder, oramount
to specify that a fixed number of seeders should be chosen.hsv2seed.age.min
(0):
People who are possible seeders must be at least this old.hsv2seed.age.max
(1000):
People who are possible seeders must be at most this old.hsv2seed.gender
(‘any’):
People who are possible seeders must have this gender. Can be eitherany
(the default),male
orfemale
.hsv2seed.fraction
(0.2):
This is only used ifhsv2seed.type
is set tofraction
, and specifies the probability each possible seeder has of actually becoming HSV2 infected.hsv2seed.amount
(1):
Ifhsv2seed.type
isamount
, this number of people will be chosen from the group of possible seeders and marked as being HSV2 infected.hsv2seed.stop.short
(‘yes’):
In case a specific amount of seeders should be chosen but this amount is not available (e.g. due to a too restrictive allowed age range), the program will terminate if this is set toyes
. It will continue despite not having the requested amount of seeders if set tono
.
Simulation intervention event¶
It’s possible that you’d want certain configuration values to change during a simulation. For example, to lower the CD4 threshold that’s used to decide if a person will be offered antiretroviral treatment or not. Changing parameters in a very general way can be done using this simulation intervention event which, when triggered, reads files containing changed configuration settings and applies them.
Note that while the changed settings will certainly affect new events that are introduced into the simulation, it will depend on the particular event type whether events already scheduled in the system will be affected or not. If the event has a fixed time at which it will take place, this time will _not_ be changed due to the intervention event. However, if the event is hazard-based and parameters of the hazard were changed, then this will definitely have an effect on the event’s fire time.
Using this simulation intervention mechanism is easiest using R or Python, and this is described next. Manually specifying this in the configuration file is possible as well, as is described later.
Using R¶
To use simulation interventions in R, for each such intervention event you need to create a
list with configuration values that you’d like to change, just as when you
create the configuration settings that you pass to the simpact.run
command.
Additionally, such a list should contain an entry called time
, which
contains the simulation time at which the modified settings need to be
introduced. Additional interventions are of course allowed, so what you need
to pass to the simpact.run
parameter called intervention
, is a list
of these lists.
For example, suppose that we already have some configuration settings in cfg
,
but that we’d like to set the CD4 threshold for treatment to 500 at simulation
time 5, and to 1000 at simulation time 10. We’d first create a list for the first
intervention event
iv1 <- list()
iv1["time"] <- 5
iv1["monitoring.cd4.threshold"] <- 500
and also a similar list for the second intervention:
iv2 <- list()
iv2["time"] <- 10
iv2["monitoring.cd4.threshold"] <- 1000
The full intervention configuration is then a list of these lists
iv <- list(iv1, iv2)
which is what we’ll pass to the simpact.run
command:
res <- simpact.run(cfg, "/tmp/simpacttest", intervention=iv)
Using Python¶
To use simulation interventions in Python, for each such intervention event you need to create a
dictionary with configuration values that you’d like to change, just as when you
create the configuration settings that you pass to the PySimpactCyan
run
command.
Additionally, such a dictionary should contain an entry called time
, which
contains the simulation time at which the modified settings need to be
introduced. Additional interventions are of course allowed, so what you need
to pass to the run
parameter called interventionConfig
, is a list
of these dictionaries.
For example, suppose that we already have some configuration settings in cfg
,
but that we’d like to set the CD4 threshold for treatment to 500 at simulation
time 5, and to 1000 at simulation time 10. We’d first create a dictionary for the first
intervention event
iv1 = { }
iv1["time"] = 5
iv1["monitoring.cd4.threshold"] = 500
and also a similar dictionary for the second intervention:
iv2 = { }
iv2["time"] = 10
iv2["monitoring.cd4.threshold"] = 1000
The full intervention configuration is then a list of these dictionaries
iv = [iv1, iv2]
which is what we’ll pass to the run
command (assuming our PySimpactCyan
object is called simpact
):
res = simpact.run(cfg, "/tmp/simpacttest", interventionConfig=iv)
Manual configuration¶
In case you’re using the command line approach to run your
simulations and need to adjust simulation interventions manually, you can
enable this using the option intervention.enabled
. Note that this setting
as well as other related settings are not used when the R or Python interface
is employed. In that case, use the relevant mechanism described above.
If this intervention mechanism is enabled, you’ll need to prepare one or more files which are similar to the full config file, but which only contain the settings that should be changed. For example, suppose that we’d like to set the CD4 threshold for treatment to 500 at simulation time 5, and to 1000 at simulation time 10. In that case we need two files, one with the line
monitoring.cd4.threshold = 500
and another file with the line
monitoring.cd4.threshold = 1000
Let’s call these files cd4_A.txt
and cd4_B.txt
respectively. In the
main config file, we need to make sure that the times for these intervention
events are mentioned using the intervention.times
option, which should contain
a list of increasing times:
intervention.times = 5,10
To specify which file should be inspected at each time, we need to use the
intervention.baseconfigname
and intervention.fileids
settings. The first
one contains a template for the file name for each intervention, in which
the %
sign is replaced by the corresponding string from intervention.fileids
.
In our example, we could set
intervention.baseconfigname = cd4_%.txt
intervention.fileids = A,B
In case the last option is left blank, the %
sign will be replaced by what
was specified in intervention.times
, so the program would look for cd4_5.txt
and cd4_10.txt
.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
intervention.enabled
(‘no’):
Indicates if the simulation interventions should be used at certain times. Possible values areno
(the default) andyes
. The following options are only used in case this is set toyes
.intervention.baseconfigname
(no default):
This is only used ifintervention.enabled
isyes
, and contains a template for the filename that is used to store the modifications to the main config file in. In this template,%
will be replaced by the relevant string fromintervention.fileids
, or by the time inintervention.times
in case theintervention.fileids
option is left empty.intervention.times
(no default):
This is only used ifintervention.enabled
isyes
, and contains a comma separated list of times (positive and increasing) at which simulation interventions should be executed.intervention.fileids
(no default):
This is only used ifintervention.enabled
isyes
, and should be either empty or a comma separated list with strings that should replace the%
sign inintervention.baseconfigname
. If empty, the time fromintervention.times
will be used to replace%
instead.
HIV infection monitoring event¶
When a person has been diagnosed as being infected with HIV,
monitoring events are scheduled to follow up on the progress of the disease
by inspecting the person’s CD4 count. If this CD4 count is
below the threshold set in monitoring.cd4.threshold
, the person will be
offered antiretroviral treatment. Depending on the person’s
willingness to accept treatment, treatment will
then be started.
If treatment is started, the person’s set-point viral load value will be
lowered according to the setting in monitoring.fraction.log_viralload
.
In this case no further monitoring events will be scheduled, but instead
the person will be at risk of dropping out of treatment and
the corresponding event will be scheduled.
On the other hand, if the person’s CD4 count was not below the threshold
or the person was not willing to start treatment,
a new monitoring event will be scheduled a while later. The precise interval
being used here, depends on the person’s CD4 count and the configuration
settings. In monitoring.interval.piecewise.cd4s
and monitoring.interval.piecewise.times
you can specify comma separated lists of (increasing) CD4 values and their corresponding
intervals. If the CD4 value lies in between specified values, linear interpolation
will be used. If the CD4 count is less than the left-most value in this series,
the interval specified in monitoring.interval.piecewise.left
will be used.
If it is larger than the right-most CD4 value, the interval from
monitoring.interval.piecewise.right
is used instead.
After dropping out of treatment, a new diagnosis event will be scheduled which then leads to new monitoring events. If this is the case, the person will always be eligible for treatment, i.e. once a person has received treatment he’s always a candidate to start treatment again. Only the person’s willingness still matters then.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
monitoring.cd4.threshold
(350.0):
This is the threshold value for a person’s CD4 count: if the count is below this value, treatment will be offered (which the person still can decline).monitoring.fraction.log_viralload
(0.7):
If the person is eligible and willing to start treatment, ART will be started. The effect of this is that the person’s set-point viral load will be lowered by this fraction on a logarithmic scale. Calling this fraction \(f\), this corresponds to \(V_{\rm sp,new} = (V_{\rm sp})^f\).monitoring.interval.piecewise.cd4s
(‘200,350’):
This is a comma separated list of increasing CD4 values, and is used when looking up the monitoring interval for a certain CD4 count.monitoring.interval.piecewise.times
(‘0.25,0.25’):
This is a comma separated list of monitoring time intervals that correspond to the CD4 values specified inmonitoring.interval.piecewise.cd4s
.monitoring.interval.piecewise.left
(0.16666):
If the CD4 count is less than the left-most value specified inmonitoring.interval.piecewise.cd4s
, then this interval is used (defaults to two months).monitoring.interval.piecewise.right
(0.5):
If the CD4 count is more than the right-most value specified inmonitoring.interval.piecewise.cd4s
, then this interval is used (defaults to six months).
Mortality event¶
To make sure that everyone in the simulation has a limited lifespan, regardless of being infected with HIV, there’s always a mortality event scheduled for each person. This time of death is based on a Weibull distribution and for a certain person, this is a fixed number that no longer changes during the simulation.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
mortality.normal.weibull.shape
(4.0):
This specifies the shape parameter of the Weibull distribution.mortality.normal.weibull.scale
(70.0):
This specifies a scale parameter to base the Weibull distribution on, but a difference between men and women still needs to be taken into account (see next option).mortality.normal.weibull.genderdiff
(5.0):
For a woman, half this value is added to the scale parameter specified inmortality.normal.weibull.scale
; for a man the same amount is subtracted.
Periodic logging event¶
In case you would like to keep track of how many people there are in the population
or how many were being treated, you can enable this event. The file specified in
periodiclogging.outfile.logperiodic
will be used to write the following properties
to: the time the event fired, the size of the population and the number of people
receiving treatment at that time. The interval between such logging events is controlled
using the periodiclogging.interval
setting. If periodiclogging.starttime
is negative
(the default), the first event will take place after the first interval has passed.
Otherwise, the first event is scheduled to take place at the specified time.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
periodiclogging.interval
(-1):
This setting specifies the interval that is used to schedule the periodic logging events. When one event fires, the next one is scheduled. If this is set to a negative value, no further events will be scheduled.periodiclogging.starttime
(-1):
If negative, the first event will take place after the first interval has passed. If zero or positive, the first event will get executed at the corresponding time.periodiclogging.outfile.logperiodic
(‘${SIMPACT_OUTPUT_PREFIX}periodiclog.csv’):
This specifies the file to which the logging occurs. By default, the value of the config variable or environment variableSIMPACT_OUTPUT_PREFIX
will be prepended toperiodiclog.csv
to yield the complete filename.
Relocation event¶
In case a non-trivial 2D location is assigned to each person, and the formation hazard depends on the geographical distance between possible partners or if an ‘eyecap’ setting is used, a relocation event may become interesting. Such an event changes the 2D location that’s assigned to a person, and writes a log entry to be able to keep track of a person.
If enabled (see relocation.enabled
), the hazard used is a time-dependent exponential
hazard:
Here, a baseline value \(a\) can be configured using the relocation.hazard.a
setting, and the age of the person can be taken into account using the value
of \(b\) (the relocation.hazard.b
setting).
The effect of a relocation is slightly different depending on the ‘eyecap’ fraction used. In case this is one, and everyone can have a relationship with everyone else, this just changes the location of the person. By controlling the importance of the geographical distance between partners in the formation event hazards, this can still affect which precise relationships will be formed. When only a fraction of the population can be seen however, triggering of the relocation event will cause all existing scheduled formation events to get cancelled. After choosing a new location of the person, a new set of interests will be generated and formation events for these persons will get scheduled.
Note that existing, formed relationships are currently not affected by this relocation event.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
relocation.enabled
(‘no’):
This controls if relocation events should be scheduled or not.relocation.hazard.a
(no default):
Sets the value of \(a\) in the relocation hazard above.relocation.hazard.b
(no default):
Sets the value of \(b\) in the relocation hazard above.relocation.hazard.t_max
(200):
As explained in the section about ‘time limited’ hazards, an exponential function needs some kind of threshold value (after which it stays constant) to be able to perform the necessary calculations. This configuration value is a measure of this threshold.
Synchronize population statistics¶
As described in the formation event, it’s possible that an event needs to use the current population size in calculating its hazard. For a large population, there will also be many birth and mortality events, and recalculating hazards for every change in population size will slow the simulation down considerably.
If the population size does not change much during the simulation, it may be adequate to just use the population size at the beginning of the simulation. On the other hand, if the number of people in the simulation tends to grow or shrink considerably, this will be a poor approximation. In that case, this event can be useful, which allows you to resynchronize the stored population size. This is a global event, meaning that afterwards event fire times for _all_ scheduled events will be recalculated, so don’t use this more than necessary.
If this event is needed, the interval between such synchronization events can be specified
using the syncpopstats.interval
configuration option. When one event fires, the next
is scheduled to go off the specified time later.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
syncpopstats.interval
(-1):
This specifies the interval between these synchronization events. A negative value means that it’s disabled.
Synchronize reference year¶
With this event, a reference time can be saved in the simulation. This is used by the ‘agegapry’ formation hazard and HIV transmission hazard, to simplify the complexity of the hazards. Scheduling of the event can be disabled by setting it to a negative value (the default).
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
syncrefyear.interval
(-1):
Specifies the time interval with which a reference simulation time should be saved. A negative value disables this.
HIV transmission event¶
When a relationship is formed between two people of which one is HIV infected, or when a relationship between two uninfected people exists and one of them gets infected, an HIV transmission event is scheduled. The hazard for this event is the following:
In this hazard, the value of
\(V\) is the current viral load of the person, which can differ
from the set-point viral load. The number of partners of the person
who is already infected is specified by \(P_{\rm infected}\), while the
number of partners of the person who will become infected when the event
fires, is \(P_{\rm uninfected}\). The value of \({\rm W}\) is \(1\) if the uninfected
person is a woman, and 0 otherwise. By configuring the weights \(f_1\) and \(f_2\),
is becomes possible to change the susceptibility of a woman depending on her
age. Note that this age is only specified approximately by using a reference
time \(t_{\rm ry}\) instead of the actual time \(t\). This reference time can be
updated using the reference year synchronization event. \({HSV2}_{\rm infected}\) and \({HSV2}_{\rm uninfected}\) specify if the HIV-infected resp. uninfected person is infected with HSV2. The value of \({HSV2}_{\rm infected}\) resp. \({HSV2}_{\rm uninfected}\) is 1 if the HIV-infected resp. uninfected person is infected with HSV2, and 0 otherwise. The values of \(b_{\rm 0j}\) and \(b_{\rm 1j}\) specify the susceptibility of the uninfected person to both diseases and HIV only respectively. Their values can be set using person.hiv.b0.dist.type
resp. person.hiv.b1.dist.type
from the HIV related person settings.
The values \(a\), \(b\), \(c\), \(d_1\), \(d_2\), \(e_1\), \(e_2\), \(f_1\), \(f_2\), \(g_1\) and \(g_2\) can be configured as specified below;
the \(A_{\rm debut}\) parameter is the debut age.
The form of this hazard was originally inspired by the article of [Hargrove et al]. The default parameters that are mentioned below are based on a fit to the data from the [Fraser et al] article.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
hivtransmission.param.a
(-1.3997):
This refers to the value of \(a\) in the expression for the hazard, providing a baseline value.hivtransmission.param.b
(-12.0220):
This refers to the value of \(b\) in the expression for the hazard. Together with the value of \(c\), this specifies the influence of the current viral load of the infected person.hivtransmission.param.c
(0.1649):
This refers to the value of \(c\) in the expression for the hazard. Together with the value of \(b\), this specifies the influence of the current viral load of the infected person.hivtransmission.param.d1
(0):
This refers to the value of \(d_1\) in the expression for the hazard, providing a weight based on the number of partners of the infected person.hivtransmission.param.d2
(0):
This refers to the value of \(d_2\) in the expression for the hazard, providing a weight based on the number of partners of the uninfected person.hivtransmission.param.e1
(0):
This refers to the value of \(e_1\) in the expression of the hazard and specifies the influence of the HIV-infected person being HSV2-infected.hivtransmission.param.e2
(0):
This refers to the value of \(e_2\) in the expression of the hazard and specifies the influence of the HIV-uninfected person being HSV2-infected.hivtransmission.param.f1
(0):
This refers to the value of \(f_1\) in the expression of the hazard.hivtransmission.param.f2
(0):
This refers to the value of \(f_2\) in the expression of the hazard.hivtransmission.param.g1
(0):
This refers to the value of \(g_1\) in the expression of the hazard. Set this parameter equal to 1 if you want to include the influence of susceptibility to both infections.hivtransmission.param.g2
(0):
This refers to the value of \(g_2\) in the expression of the hazard. Set this parameter equal to 1 if you want to include the influence of susceptibility to HIV only.hivtransmission.maxageref.diff
(1):
As explained above, the hazard does not use the real time dependency \(t\), but refers to a reference time \(t_{\rm ry}\) that needs to be synchronized periodically using the synchronize reference year event. The program will abort if it detects that the last reference time synchronization was more than this amount of time ago, which by default is one year.
HSV2 transmission event¶
When a person is HSV2 infected and and a relationship is formed, an HSV2 transmission event will be scheduled. A time dependent exponential hazard is used:
The value of \(a_i\) can be set using person.hsv2.a.dist.type
from the
HSV2 related person settings. This value is taken from
the person that’s already infected. The \(b\) value can be configured using
hsv2transmission.hazard.b
, and \(t_{\rm HSV2-infected}\) is the time at which
the infected person acquired the HSV2 infection. \(M_{\rm i}\) represents the gender effect and is taken from the person that’s already HSV2-infected. It’s value is 1 for male and 0 for female. The value \(H_{\rm i}\) is an indicator for the HSV2-infected person being HIV-infected. It’s value is 1 for HIV-infected and 0 for HIV-uninfected. The values of \(b_{\rm 0j}\) and \(b_{\rm 2j}\) specify the susceptibility of the uninfected person to both diseases and HSV2 only respectively. The value of \(b_{\rm 0j}\) can be set using person.hiv.b0.dist.type
from the HIV related person settings. The value of \(b_{\rm 2j}\) can be set using person.hsv2.b2.dist.type
from the HSV2 related person settings.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
hsv2transmission.hazard.b
(0):
This configures the value of \(b\) in the hazard above.hsv2transmission.hazard.c
(0):
This configures the value of c for the gender effect in the hazard above.hsv2transmission.hazard.d
(0):
This configures the value of d for the HIV effect in the hazard above.hsv2transmission.hazard.e1
(0):
This refers to the value of \(e_1\) in the expression for the hazard. Set this parameter equal to 1 if you want to include the influence of susceptibility to both infections.hsv2transmission.hazard.e2
(0):
This refers to the value of \(e_2\) in the expression for the hazard. Set this parameter equal to 1 if you want to include the influence of susceptibility to HSV2 only.hsv2transmission.hazard.t_max
(200):
As explained in the section about ‘time limited’ hazards, an exponential function needs some kind of threshold value (after which it stays constant) to be able to perform the necessary calculations. This configuration value is a measure of this threshold.

MaxART simulation¶
Introduction¶
The Simpact Cyan simulation framework also has extensions for simulations in the context of the MaxART study. For the most part, the usage and settings are the same as in the main Simpact Cyan program, and the reader is therefore strongly suggested to consult the core Simpact Cyan reference documentation. This document will only contain information regarding to settings that deviate from the ones specified in the core documentation.
Notes regarding the installation are also the same as in the core documentation. The necessary MaxART-specific software is automatically included when the relevant Simpact Cyan program is installed.
Selecting the MaxART-specific simulation¶
When running from the command line, instead of
using the executables simpact-cyan-release
or simpact-cyan-debug
, you should
use maxart-release
or maxart-debug
.
It is however usually far more convenient to run the simulation from R
or to run from Python. In case the
R environment is used, the user must specify that MaxART related simulations
are to be performed by setting the simulation prefix to maxart
:
simpact.set.simulation("maxart")
When the Python method is used, the function setSimulationPrefix
must be called.
For example, if the simulation object has the name simpact
, the call would be:
simpact.setSimulationPrefix("maxart")
Output¶
In addition to the output files of the general simulation, there
is one additional MaxART-specific output file possible, specified by the
maxart.outfile.logsteps
option. This output file will contain a table with
the different health care facilities, the time at which
a study step took place, and an indication of the stage that the facility was in
during a specific step. This can be one of:
C
for the control stageT
for the transitioning stageI
for the intervention stagePost
for when the study has ended
An example can be seen in the following notebook: maxart-facilitysteps.ipynb
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
maxart.outfile.logsteps
(‘${SIMPACT_OUTPUT_PREFIX}maxartsteplog.csv’):
The name of the CSV file that will contain information about which facility was in which stage of the study during the simulation.
Simulation details¶
The simulation proceeds in largely the same fashion as described in the core documentation, but there are a few alterations for the MaxART context. The study is specific to the Hhohho region of Swaziland, and the person settings have been set up so that the location of a person is based on the population density in the Hhohho region. The health care facilities can be described by various settings, including their names and locations and their randomization used in the MaxART trial. For testing purposes, it has been made straightforward to avoid using the true randomization settings used in the study.
The start of the MaxART study in the simulation, can be specified using the start of study event. Once the study has been started, the first step event is scheduled to take place a specific amount of time later. Until that time, all health care facilities are marked as being in the ‘control’, or ‘standard of care’ phase. When the first step event fires, two facilities [*] are marked as being in a ‘transition period’, and a new step event is scheduled with the same interval. If available, the firing of a step event will also advance the facilities that previously were in the ‘transition period’ to the ‘treatment for all’ phase. When no more facilities are available, the end of study event is scheduled to take place a specific time interval later.
Deciding if an HIV infected person may be treated, is done in the HIV infection monitoring event. This is again similar as in the core program, but the CD4 threshold to decide if a person is eligible for treatment, can be set differently for the various study stages. The default settings are counts of 350 unless in the transition period or treatment for all period, in which case there’s no specific threshold anymore.
[*] The number of facilities actually depends on the information in the randomization file, but the default is two.
Person settings¶
The configurable person settings are the same as in the main program, except for the geographical location of a person. In this case, the defaults are set up in such a way that a person’s location is chosen based on the population density of the Hhohho region of Swaziland. The following iPython notebook illustrates this: maxart-popdens.ipynb
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
person.geo.dist2d.type
(‘discrete’ with settings for the Hhohho region of Swaziland):
This two dimensional distribution is used to assign a geographic location to each person. Thedensfile
parameter is set toSWZ10adjv4.tif
, which contains information about the population density in Swaziland. To limit the geographical distribution to the one from the Hhohho region, the mask filehhohho_mask.tiff
is used.
Participating health care facilities¶
The health care facilities that participate in the MaxART study are specified
in a CSV file (facilities.geo.coords
) that lists the name, the longitude and
the latitude of each facility. Because the person coordinates use an X and Y
distance relative to some corner of a map of Swaziland, these geographic
coordinates of the facilities cannot be used directly. Instead, they will be
transformed to X and Y positions based on the facilities.geo.start.latitude
,
facilities.geo.start.longitude
and facilities.geo.start.corner
settings.
If facilities.outfile.facilityxypos
is specified, the resulting X and Y
values will be written to a CSV file, so that they can be compared to the
person locations. This is illustrated in the following iPython notebook:
maxart-popdens.ipynb
The order in which the facilities are used in the study, is specified in the
CSV file in the facilities.randomization
setting.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
facilities.geo.coords
(‘maxart-facilities.csv’ from the data directory):
This is the name of the CSV file that specifies the names of the facilities in the study, together with their GPS coordinates. These coordinates must be transformed to X and Y values so that they can be related to the location of each person, and the values needed for this transformation are specified in the following three options.facilities.geo.start.latitude
(-25.7172):
Together withfacilities.geo.start.longitude
, this specifies the origin of the X-Y coordinate system that should be used to relate the facility locations to the person locations.facilities.geo.start.longitude
(30.7901):
Together withfacilities.geo.start.latitude
, this specifies the origin of the X-Y coordinate system that should be used to relate the facility locations to the person locations.facilities.geo.start.corner
(‘top’):
This value can be “top” or “bottom”, and specifies if Y distances should be positive if the location of a facility is more south (for ‘top’) than the latitude infacilities.geo.start.latitude
, or when more north (for ‘bottom’).facilities.outfile.facilityxypos
(not written by default):
If specified, the coordinates resulting from the transformation above, will be writted to this CSV file.
facilities.randomization
(‘maxart-randomization.csv’ from the data directory):
This specifies the randomization of the health care facilities to be used in the simulation. For testing purposes, some fake randomization files can be downloaded here:
Events¶
All of the events described in the main documentation are still available. Below, only the events that have been altered or added will be described.
HIV infection monitoring event¶
When a person has been diagnosed as being infected with HIV, monitoring events are scheduled to follow up on the progress of the disease by inspecting the person’s CD4 count. If this CD4 count is below a certain configurable threshold, the person will be offered antiretroviral treatment. Depending on the person’s willingness to accept treatment, treatment will then be started.
The threshold can be set differently depending on the stage a facility
is in. Different values can be set before the study starts and after it ends
(monitoring.cd4.threshold.prestudy
and monitoring.cd4.threshold.poststudy
),
and during the MaxART study it can be set differently for the control stage,
the transition stage and the intervention stage (the treatment for all period)
(monitoring.cd4.threshold.instudy.controlstage
, monitoring.cd4.threshold.instudy.transitionstage
and monitoring.cd4.threshold.instudy.interventionstage
).
Note that it is currently assumed that a person will receive such monitoring at the health care facility that is geographically the closest one. This is illustrated in the following iPython notebook: maxart-monitoringfacilities.ipynb
If treatment is started, the person’s set-point viral load value will be
lowered according to the setting in monitoring.fraction.log_viralload
.
In this case no further monitoring events will be scheduled, but instead
the person will be at risk of dropping out of treatment and
the corresponding event will be scheduled.
On the other hand, if the person’s CD4 count was not below the threshold
or the person was not willing to start treatment,
a new monitoring event will be scheduled a while later. The precise interval
being used here, depends on the person’s CD4 count and the configuration
settings. In monitoring.interval.piecewise.cd4s
and monitoring.interval.piecewise.times
you can specify comma separated lists of (increasing) CD4 values and their corresponding
intervals. If the CD4 value lies in between specified values, linear interpolation
will be used. If the CD4 count is less than the left-most value in this series,
the interval specified in monitoring.interval.piecewise.left
will be used.
If it is larger than the right-most CD4 value, the interval from
monitoring.interval.piecewise.right
is used instead.
After dropping out of treatment, a new diagnosis event will be scheduled which then leads to new monitoring events. If this is the case, the person will always be eligible for treatment, i.e. once a person has received treatment he’s always a candidate to start treatment again. Only the person’s willingness still matters then.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
monitoring.cd4.threshold.prestudy
(350):
This is the threshold value for a person’s CD4 count, _before_ the start of the study: if the count is below this value, treatment will be offered.monitoring.cd4.threshold.poststudy
(350):
This is the threshold value for a person’s CD4 count, _after_ the end of the study: if the count is below this value, treatment will be offered.monitoring.cd4.threshold.instudy.controlstage
(350):
This is the threshold value for a person’s CD4 count, during the MaxART study, when the person is at a facility in the control stage. If the count is below this value, treatment will be offered.monitoring.cd4.threshold.instudy.transitionstage
(‘inf’):
This is the threshold value for a person’s CD4 count, during the MaxART study, when the person is at a facility in the transition stage. If the count is below this value, treatment will be offered.monitoring.cd4.threshold.instudy.interventionstage
(‘inf’):
This is the threshold value for a person’s CD4 count, during the MaxART study, when the person is at a facility in the intervention stage. If the count is below this value, treatment will be offered.monitoring.fraction.log_viralload
(0.7):
If the person is eligible and willing to start treatment, ART will be started. The effect of this is that the person’s set-point viral load will be lowered by this fraction on a logarithmic scale. Calling this fraction \(f\), this corresponds to \(V_{\rm sp,new} = (V_{\rm sp})^f\).monitoring.interval.piecewise.cd4s
(‘200,350’):
This is a comma separated list of increasing CD4 values, and is used when looking up the monitoring interval for a certain CD4 count.monitoring.interval.piecewise.times
(‘0.25,0.25’):
This is a comma separated list of monitoring time intervals that correspond to the CD4 values specified inmonitoring.interval.piecewise.cd4s
.monitoring.interval.piecewise.left
(0.16666):
If the CD4 count is less than the left-most value specified inmonitoring.interval.piecewise.cd4s
, then this interval is used (defaults to two months).monitoring.interval.piecewise.right
(0.5):
If the CD4 count is more than the right-most value specified inmonitoring.interval.piecewise.cd4s
, then this interval is used (defaults to six months).
Start of study event¶
To mark the start of the MaxART study in the simulation, this event can be triggered at
a specific time (maxart.starttime
). This has an effect on the threshold that will be
used in the HIV monitoring event: until this start of study event has
been fired, the ‘pre-study’ threshold will be used.
When this event has been fired, the participating health care facilities are all marked as being in the control stage. A study step event will be scheduled a specific time later, to advance a number of facilities (depending on the randomization) to the transition stage.
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
maxart.starttime
(5):
This is the simulation time at which this event will be fired, indicating the start of the MaxART study. To disable this, set to a negative value.
Time step within study¶
When this event fires for the first time, a number of health care facilities (depending on the randomization) are advanced from the control stage to the transition stage. At this point, a new time step event is scheduled to take place, which will advance these facilities to the intervention stage, and will place other facilities (again depending on the randomization) in the transition stage. When no more facilities can be placed in the transition stage, an end of study event will be scheduled.
The interval between the start of the study, the first time step event and
subsequent time step events, and between the last time step event and the
end of study event, is configured using the maxart.stepinterval
option.
The following iPython notebook shows these steps for one of the fake randomizations provided in the health care facilities section: maxart-facilitysteps.ipynb
Here is an overview of the relevant configuration options, their defaults (between parentheses), and their meaning:
maxart.stepinterval
(0.33333):
This is the interval that will be used in between the time steps of the study (defaults to four months). This same interval will be used between the start of the study and the first step event, and between the last step event and end of study event.
End of study event¶
When a time step event fires, and no more health care facilities can be advanced from the control stage to the transition stage, this indicates the the study is almost done, and an end of study event will be scheduled. This event will fire the same amount of time later as the interval between the steps.
There are no configurable options for this event, it only serves to mark the end of the MaxART study in the simulation. Once the study is done, the ‘post-study’ CD4 threshold will be used as configured in the HIV monitoring event.
References¶
- Anderson, David F., A modified next reaction method for simulating chemical systems with time dependent propensities and delays, The Journal of Chemical Physics, Volume 127, Issue 21
- Arnaout, Ramy A., et al, A simple relationship between viral load and survival time in HIV-1 infection, Proc Natl Acad Sci U S A. 1999 Sep 28; 96(20): 11549-11553.
- Hargrove, John, et al, How Should We Best Estimate the Mean Recency Duration for the BED Method?, PLoS One 7(11): e49661.
- Fraser, C., et al, Variation in HIV-1 set-point viral load: Epidemiological analysis and an evolutionary hypothesis, Proceedings of the National Academy of Sciences of the United States of America. 2007;104(44):17441-17446.