5. biPOD Supplementary Materials and Methods
a5_supplementary_materials.Rmd
Methods
Multi-step logistic and exponential growth models
A general growth process can be seen as a continuous-time Markov process [8,11], where represents the number of individuals in the population of cancerous cell at time , and indicates the instant of time in which the first cancerous cell is born, i.e. . The transition rates for such problem are defined as:
Where and are both positive functions and indicates the birth and death rates of the population when . Similarly, one can denote with and with the individual birth and death rate at time . Assuming that with and given , one is interested in computing
which represents the probability that the population reaches a population size equal to at time conditioned on the population size at . Quite trivially, $P_0m=0 \hspace{1mm} \forall m>0$, meaning that 0 is an absorbing state for the Markov process. The probability generating function for this process is
and, as shown in [12], can be rewritten as
where the quantities and depends on the functional form of the individuals birth and death rates:
Hence, it is possible to easily compute the expectation and the variance of :
Exponential growth
The first growth pattern we consider can be expressed through a simple yet ubiquitous differential equation:
where represents the growth rate. The solution of the aforementioned equation yields
indicating that a population starting from will evolve exponentially over time with no upper bounds. In such growth pattern, the transition rates can be written as
It’s natural to observe that the birth and death rates are linear with respect to , meaning that the probability of a birth (or death) happening in an infinitesimal time-interval is proportional to the previous size of the population. Let be the individual net growth rate at time and assume that is a piece-wise constant function, representing different net growth rates for a multi-step growth process. Without loss of generality, consider a process with a single known time-point in which the net growth rate changes from to ; in such case the form of will be
It’s trivial to see that this can be generalised to any number of time-points, i.e. to any number of different growth rates, and that the expected mean conditioned on can be computed as
Logistic growth
The logistic model is a generalisation of the exponential one which takes into account the possibility that the maximum population size is constrained. In this case we define the carrying capacity as such upper bound, which can be caused by many environmental factors such as the availability of resources or space. The generalised differential equation becomes
The second term modulates the effective growth rate affecting the population: a bigger population size leads to a slower growth. By letting , the logistic equation tends to the exponential one. The equation can easily be solved by separation of variables and which yields as solution
One can also merge the two terms of the differential equation and substitute them with a time-dependent growth rate:
In this case, the solution for is straightforward:
Now, similarly as what has been done in the exponential case where indicated the net growth rate per capita and was defined as a piece-wise constant function, once again represents the net growth rate per capita which depends on the initial conditions, the carrying capacity and the functional form of . Once again, we assume to be piece-wise constant and we denote as ti its changing points. Hence, we are modelling growth patterns in which the net growth rate changes over the time not only depending on the population size, but also on the value of the piece-wise constant function . Now, without loss of generality let’s consider a single changing point so that one can first compute and the the expected mean conditioned on the starting conditions (i.e. the value of )
where we define as .
Start of the process
We discussed so far the situation in which since the most common situation is that the first observation of a growth process happens much later than the actual start of the process. Nonetheless, one could also be interested in obtaining an estimate, or at least an upper bound, for . This can be done by simply shifting by the values of all the instant of times (i.e. with the change of variables ). The only modification to the model is that now .
Population dynamics change-points detection
In general, the change-points defining the intervals of time of the piece-wise functions that we consider might not be known and, therefore, a method to infer them is needed. The main assumption we make is that the underlying growth rates do change sign at the breakpoints. In other words, at a given change point, the population should change its status from “growing” to “shrinking” or viceversa. It follows that the change-points will be found at the locations in which the true function describing the process will present either a minimum or a maximum, or, equivalently, in which the first derivative will be zero. Given the intrinsic noise present in the observations, a point might look like a maximum/minimum of the function without actually being one. Because of this, we make use of a spline approximation to smooth out the noise and to propose a set of candidates change-points which will be used as priors during the inference of the breakpoints.
The cubic spline estimate is the function that minimize [13]
where is a smoothing parameter that controls the trade-off between the “fitness” (first term) and the “roughness” (second term) of the approximation. It’s easy to notice that produces the interpolating splines whereas produces a linear least squares estimate. Once the approximation is obtained, one can easily compute the points in which its first derivative changes sign. We denote those points as . Obviously, the choice of affects the multiplicity of the minima/maxima that will be found. Therefore, can either be user-defined or can be tuned in order to obtain a user-defined number of change-points. Once the set of have been found, we define a prior over them. For each we choose a truncated normal distribution with the following characteristics:
Moreover, the prior distributions are constrained by the fact that the inferred change-points should be in increasing order and therefore we enforce that their intervals do not overlap. Hence, considering , might change according to
Then, the inference is carried out in order to obtain posterior distributions for the change points and the medians of such posteriors can be used as “true” change points for the first task.
Nested growth models for competing populations
A typical scenario is the one in which a tumor is treated, its mass shrinks and it later regrows. In this case, the observed dynamics is U-shaped. For such cases, we assume the presence of two populations and ; the first is sensitive to the treatment, whereas the second one is resistant to it. This means that a positive and a negative growth rates will be exhibited by the two populations, respectively.
The sensitive population is assumed to be present at the start of treatment with the initial condition and is governed by the differential equation
For the second population is assumed that the first resistant cell is born at . Therefore, the differential equation governing the dynamics of the resistant population will be
where the Heaviside function is defined as
Trivially, the observed counts will be the sum of the two populations and can be obtained analytically by integrating the two equations above and summing them. For this gives:
Likelihood compuation
For all the presented models and for a given set of observations $X =\{(x_n, y_n\)}$ we consider each of them as drawn from a poisson process with parameter equal to the expected value of the population where the analytical form of the expected value will depend on the considered task. This means that for a given task with parameters , the likelihood will be computed as:
Models’ PGMs
The probabilistic graphical models for the tree different task will be presented and the prior distributions employed in each of them will be described in detail.
Task 1
Given the trajectory $X =\{(x_n, y_n\)}$ of a population divided in time windows, the parameters of interest will be the growth rates , the instant of time in which the population was born and, if the population is undergoing a logistic growth, the carrying capacity .
The priors for those values are:
We choose a standard normal for the growth rates, a broad normal for with the constraint that must be smaller or equal to the time of the first observation, and a broad normal for the carrying capacity with the constraint that must be greater or equal to the largest observation.
Task 2
Given the trajectory $X =\{(x_n, y_n\)}$ of a population, the parameters of interest will be the change points, where is either given by the user or is the results of the choice of the parameter in the spline approximation.
The priors for those values are:
Essentially, for each proposed by the spline approximation we choose a flat normal distribution constrained in the interval .
Task 3
Given the trajectory $X =\{(x_n, y_n\)}$ where a resistant and a sensitive population are assumed to co-exist, the parameters of interest will the two growth rates and , the size of the sensitive population at the instant of time of the first observation and the instant of time in which the resistant population is born .
The priors for those values are:
Essentially, we choose a standard normal distribution for the two growth rates, a very broad normal for with the constraint that if at the beginning the resistant population is not born yet, the maximum error we are committing is 10%, and a very flat normal distribution centered on the beginning of the process for .