heteromotility/hmstats module

  1. Calculate general motility features (average speed, distance, …)
  2. Calculate features related to mean squared displacement
  3. Calculate features related to random walk similarity
heteromotility.hmstats.average_xy(coor1, coor2)

Finds the average value of two XY coordinates

heteromotility.hmstats.distance(coor1, coor2)

Euclidean distance

class heteromotility.hmstats.GeneralFeatures(cell_ids, move_thresh=5)

Bases: object

Calculates features of speed, distance, path shape, and turning behavior.

cell_ids

dict – keyed by cell ids, values are lists of coordinate tuples.

moving_tau

int – threshold for movement

tau_range

iterable, ints – range of window sizes to use to establish cell direction by regression.

interval_range

iterable, ints – range of time intervals to use between a cell’s current and ‘next’ direction when calculating turning features.

total_distance

dict – total distance each cell travels. keyed by cell id, values are floats.

net_distance

dict – net distance each cell traveled, dist(final_pos - init_pos) keyed by cell id, values are floats.

linearity

dict – Pearson’s r**2 linearity metric on cell paths. keyed by cell id, values are floats.

spearmanrsq

dict – Spearman’s rho**2 monotonicity metric on cell paths. keyed by cell id, values are floats.

progressivity

dict – net_distance/total_distance, metric of directional persistence. keyed by cell id, values are floats.

min_speed

dict – minimum cell speed. keyed by cell id, values are floats.

max_speed

dict – maximum cell speed. keyed by cell id, values are floats.

avg_moving_speed

dict – average moving speed above a certain threshold speed, defined as the movement cutoff. useful to deconfound the cell’s average speed while moving from the dwell time in an immotile state. keyed by threshold, values are dicts keyed by cell id, valued with floats.

time_moving

dict – proportion of time spent moving above a certain threshold speed, defined as the movement cutoff. keyed by threshold, values are dicts keyed by cell id, valued with floats.

turn_stats

dict – proportion of the time a cell turns to the right of its past direction. useful to determine directional bias. keyed by tau from tau_range, values are dicts keyed by interval from interval_range. final dicts have float values, [0, 1].

theta_stats

dict – average turning angle magnitude. keyed by tau from tau_range, values are dicts keyed by interval from interval_range. final dicts have float values, [0, 1].

calc_total_distance(cell_ids)
calc_net_distance(cell_ids)
calc_linearity(cell_ids)
calc_spearman(cell_ids)
calc_progressivity(cell_ids, net_distance, total_distance)
calc_minmax_speed(cell_ids)

Calculate max/min speeds as the distance traveled across 5 frames and append as a float to list max_speed/min_speed Speeds in the unit of pixels per hour

calc_moving_stats(cell_ids, move_speed, tau)

Calculates the proportion of time a cell spends moving_speeds, min, max, and average speeds while moving

Parameters:
  • cell_ids (dict keyed by cell id, containing lists of tupled XY coors) –
  • move_speed (minimum speed to be considered "moving") –
  • tau (time lag to use when determining movement speed) –
calc_moving_variable_threshold(cell_ids, thresholds=range(1, 11), tau=5)
turn_direction(cell_path, tau=10, interval=5)

Returns the proportion of time a cell turns left as float 0.0-1.0.

Parameters:
  • cell_path (list) – timeseries of tuples, each tuple holding an XY position i.e. cell_path = [(x1,y1), (x2,y2)…(xn,yn)]
  • tau (int) – desired time lag between a point of interest and a point in the distance (p_n+tau) to determine a cell’s turning behavior. Must be > interval.
  • interval (int) – number of points ahead and behind the point of interest to consider when find a regression line to represent the cell’s direction at p_n.
Returns:

  • turns (list) – binaries reflecting turns left (0) or right (1)
  • thetas (list) – angles of each turn in radians

Notes

(1) Estabish direction of object at p_n: Given a time series XY of x,y coordinates, where p_t denotes a point at time t, take a given point p_n determine the ‘direction’ of motion at p_n by plotting a linear reg. on points [p_n-i, p_n+i] for some interval value i This linear regression function is R(x) = slope*x + b

(2) Determine if p_n+tau is left or right For a point p_n+tau, for a variable time lag tau, determine if p_n+tau lies left or right of R

p_n+tau,Y > R(p_n+tau) and initial direction = right : left turn p_n+tau,Y < R(p_n+tau) and initial direction = right : right turn …

(3) Calculate magnitude of turn theta Calculate the angle of the turn theta as the angle between

R(x) from (1) R2, line connecting p_n to p_n+tau

v1 = (1, slope of R(x)) v2 = (1, slope of R2)

theta = arccos( v1 dot v2 / ||v1|| * ||v2||)

N.B. If a cell moves in a perfectly straight line, such that point of interest p_n has x coordinate == p_n+tau, the function skips this p_n i.e. Skips if the cell has moved in a perfectly linear line, or circled back on itself

all_turn_stats(cell_ids, tau_range=range(9, 12), interval_range=range(5, 7))
Parameters:
  • cell_ids (dict) – keyed by cell id, values are lists of tuple coordinates.
  • tau_range (iterable) – range of time lags to try for calculation.
  • interval_range (iterable) – range of interval distances to use for calculation of the linregress about point of interest p_n.
Returns:

  • turn_stats (dict) – keyed by tau, containing dict keyed by interval, containing dict keyed by cell, containing proportion of time a cell turns right
  • theta_stats (dict) – keyed by tau, containing dict keyed by interval, containing dict keyed by cell, containing a list of three elements [avg_theta, min_theta, max_theta]

class heteromotility.hmstats.MSDFeatures(cell_ids)

Bases: object

Calculates mean squared displacement related features.

cell_ids

dict – keyed by cell ids, values are lists of coordinate tuples.

msd_distributions

dict – MSDs for each cell across a range of time lags tau. keyed by cell ids, values are lists of floats.

log_distributions

dict – log transform of msd_distributions. keyed by cell ids, values are lists of floats.

alphas

dict – alpha exponent for each cells motion. keyed by cell ids, values are floats.

Notes

MSD(tau) = (1 / tau) * sum( (x(t + tau) + x(t))^2 )

Plotting a distribution of Tau vs MSD will generate a curve. The exponential nature of this curve will describe the type of motion the particle is experiencing.

1 < alpha –> impeded diffusion Linear curve (alpha = 1) –> diffusion 1 < alpha < 2 –> super diffusion alpha = 2 –> ballistic motion

calc_msd(path, tau)

Calculates mean squared displacement for a cell path for a given time lag tau

Parameters:
  • path (list) – tuples of sequential XY coordinates.
  • tau (int) – time lag to consider when calculating MSD.
Returns:

msd – mean squared displacement of path given time lag tau.

Return type:

float

calc_msd_distribution(path, max_tau)

Calculates the distribution of MSDs for a range of time lag values tau.

Parameters:
  • path (list) – tuples containing sequential XY coordinates.
  • max_tau (int) – maximum time lag tau to consider for MSD calculation.
Returns:

  • distribution (list) – MSDs indexed by tau.
  • log_distribution (list) – log transform of distribution.

Notes

If a cell is stationary indefinitely, it effectively has a MSD of 0. However, this calls a math domain error, so checks are in place that ensure the final alpha calculation will be 0 without raising exceptions.

Here – returns both distributions as a string ‘flat’ if MSD calc is 0 for a given range tau

msd_dist_all_cells(cell_ids, tau=31)

Calculates MSD distributions for all cells in dict cell_ids

Parameters:
  • cell_ids (dictionary keyed by cell_id, contains lists of) – tuples of sequential XY coordinates
  • tau (maximum time lag to calculate for MSD distributions) – n.b. if path < 30 t’s long, uses len(path) as max tau
Returns:

  • msd_distributions (dict keyed by cell_id containing list) – of MSD distributions, indexed by tau
  • log_distributions (dict with log transformed lists of) – MSD distributions
  • Notable behavior
  • ——————-
  • checks for distributions == ‘flat’ string and sets values for
  • that cell as ‘flat’, which is checked to provide an alpha of 0

calc_alphas(log_distributions)

Calculate the alpha coefficient value as the slope of a log(MSD) v log(tau) plot

Parameters:log_distributions (dict) – keyed by cell_id containing lists of log transformed MSDs, indexed by tau
Returns:alphas – keyed by cell_id, with values as the alpha coeff from log(MSD(tau)) = alpha*log(tau)
Return type:dict

Notes

Checkes if log_distributions[cell_id] is == ‘flat’ If so, sets alpha at the appropriate “flat” slope of 0

class heteromotility.hmstats.RWFeatures(cell_ids, gf)

Bases: object

Calculates features relative to a random walk and self-similarity measures.

cell_ids

dict – keyed by cell ids, values are lists of coordinate tuples.

gf

hmsims.GeneralFeatures object.

diff_linearity

dict – difference in linearity r**2 between observed cell and a random walk. keyed by cell ids, values are floats.

diff_net_dist

dict – difference in net distance between an observed cell and a random walk. keyed by cell ids, values are floats.

cell_kurtosis

dict – measured displacement distribution kurtosis. keyed by cell ids, values are floats.

diff_kurtosis

dict – difference between cell_kurtosis and a random walk kurtosis. keyed by cell ids, values are floats.

nongaussalpha

dict – non-Gaussian parameters alpha_2 of the displacement distribution. keyed by cell ids, values are floats.

disp_var

dict – variance of the displacement distribution. keyed by cell ids, values are floats.

hurst_RS

dict – Hurst coefficients of the displacement time series, as estimated by Mandelbrot’s original Rescaled Range method. keyed by cell ids, values are floats.

autocorr_max_tau

int – maximum tau for autocorrelation calculation.

autocorr

dict – autocorrelation coefficients for displacement time series. keyed by cell ids, values are floats.

random_walk(origin, N, speed_mu, speed_sigma=None)
Parameters:
  • origin (starting point for the model) –
  • N (number of steps to model) –
  • speed_mu (mean speed for the model) –
  • speed_sigma (stdev of the normal distribution for step sizes) –
Returns:

  • model_net_dist (float)
  • model_linearity (float)
  • model_path (list)

Notes

Net distance of random walks is based on tau = time_unit rate = sigma = step_size = time_unit * velocity_x n = number of steps (can be exchanged for t if =) <x(t)^2> = n * sigma^2 <x(t)^2>^0.5 = sigma * sqrt(n) = step_size * sqrt(n) <(x,y)^2> = <r^2> = <x(t)^2> + <y(t)^2> = 2 * n * sigma^2 <r^2>^0.5 = sqrt(2) * step_size * sqrt(n)

step_size = rate = c = sqrt(x_step^2 + y_step^2) = sqrt(2*step^2) x_step = y_step = rate/sqrt(2) (Random Walks in Biology, 1992, Howard C. Berg)

compare_to_randwalk(cell, u, cell_linearity, cell_net_dist, cell_avg_speed)

Compares an observed path to a simulated random walk with the same displacement mean

run_comparative_randwalk(cell_ids, linearity, net_distance, avg_speed)

Runs comparisons to random walks for all cells in cell_ids

make_displacement_dist(cell_ids, tau)

Generates displacement distributions for all cells in cell_ids

kurtosis_comparison(cell_ids, displacements)

Compares cell kurtosis to random walk kurtosis

vartau_kurtosis_comparison(cell_ids, max_tau=10)

Calculate cell_kurtosis and diff_kurtosis for a range of possible time intervals tau.

moving_kurt_comparison(cell_ids, displacements, speed)

Compare kurtosis only while cell is moving

moving_kurt(cell_ids, max_tau=10, moving_range=range(1, 11))
Parameters:
  • cell_ids (dict) – keyed by cell_id, values are lists of corrdinate tuples.
  • max_tau (int) – maximum time lag to consider for kurtosis calculations.
  • moving_range (iterable) – range of speeds to consider as a threshold for movement.
Returns:

  • cell_moving_kurt (triple dict raw kurtosis of all cells in cell_ids) – keyed by speed threshold, tau, and cell_id cell_moving_kurt = {

    speed1 : {
    tau1 :

    {cell_id1 : kurt …}

    } …

    }

  • diff_moving_kurt (kurtosis of all cells, normalized by Rayleigh kurt)

  • structured as above

displacement_props(cell_ids)

Calculates variance and skewness of the displacement distribution for each cell in cell_ids

Parameters:cell_ids (dict of lists containing tuples of sequential XY coordinates) –
Returns:
  • var (dict keyed by cell_id with variance of displacement distribution)
  • skew (dict keyed by cell_ids with skew of displacement distribution)
calc_ngaussalpha(X)

Calculates the non-Gaussian parameter alpha_2 of a given displacement distribution, X

Parameters:X (array-like) – distribution of displacements.
Returns:alpha_2 – non-Gaussian coefficient, floating point values.
Return type:float

Notes

alpha_2 = <dx^4> / 3*<dx^2>^2 - 1

effectively, a ratio of coeff* (kurtosis / variance)

For a Gaussian distribution, alpha_2 = 0 For non-Gaussian distributions, alpha_2 increases with the length of the tails

Levy-like motion would be expected to have alpha_2 > 0, while diffusive motion would be expected to have alpha_2 == 0

Rayleigh alpha_2 ~= -0.33

nongauss_coeff(cell_ids)

Calculates non-Gaussian coefficient alpha_2 for all cells in cell_ids.

Parameters:cell_ids (dict) – lists containing tuples of sequential XY coordinates.
Returns:nongauss_coeff – keyed by cell_id, nonGauss coeff values.
Return type:dict
largest_pow2(num)

Finds argmax_n 2**n < num

rescaled_range(X, n)

Finds rescaled range <R(n)/S(n)> for all sub-series of size n in X

hurst_mandelbrot(cell_ids)

Calculates the Hurst coefficient H for each cell in cell_ids.

Notes

for E[R(n)/S(n)] = Cn**H as n –> inf H : 0.5 - 1 ; long-term positive autocorrelation H : 0.5 ; fractal Brownian motion H : 0-0.5 ; long-term negative autocorrelation

N.B. SEQUENCES MUST BE >= 18 units long, otherwise linear regression for log(R/S) vs log(n) will have < 3 points and cannot be performed

dfa_all(cell_ids)

Performs detrended fluctuation analysis on cell displacement series.

Parameters:
  • cell_ids (dict of lists keyed by cell_id) –
  • list represents a cell. lists contain sequential tuples (ea.) –
  • XY coordinates of a cell at a given timepoint (containing) –
Returns:

  • dfa_alpha (dict keyed by cell_id, values are the alpha coefficient)
  • calculated by detrended fluctuation analysis

References

Ping et. al., Mosaic organization of DNA nucleotides, 1994, Phys Rev E

autocorr_all(cell_ids, max_tau=10)

Estimates the autocorrelation coefficient for each series of cell displacements over a range of time lags.

Parameters:
  • cell_ids (dict of lists keyed by cell_id) –
  • list represents a cell. lists contain sequential tuples (ea.) –
  • XY coordinates of a cell at a given timepoint (containing) –
Returns:

  • autocorr (dict of lists, containing autocorrelation coeffs for)
  • sequential time lags
  • qstats (dict of lists containing Q-Statistics (Ljung-Box))
  • pvals (dict of lists containing p-vals, as calculated from Q-Statistics)

Notes

Estimation method: https://en.wikipedia.org/wiki/Autocorrelation#Estimation

R(tau) = 1/(n-tau)*sigma**2 [sum(X_t - mu)*(X_t+tau - mu)] | t = [1,n-tau]

X as a time series, mu as the mean of X, sigma**2 as variance of X tau as a given time lag (sometimes referred to as k in literature)

Implementation uses statsmodels.tsa.stattools.acf()

n.b. truncated to taus [1,10], to expand to more time lags, simply alter the indexing being loaded into the return dicts

partial_acorr_all(cell_ids, max_tau=10)

Estimates the partial autocorrelation coefficient for each series of cell displacements over a range of time lags.

Parameters:
  • cell_ids (dict) – keyed by cell_id, values are lists of coordinate tuples.
  • max_tau (int) – maximum tau to estimate partial autocorrelation.
Returns:

partial_acorr – keyed by cell id, values are lists, containing autocorrelation coeffs for sequential time lags

Return type:

dict

Notes

n.b. truncated to taus [1,10], to expand to more time lags, simply alter the indexing being loaded into the return dicts. Implementation uses statsmodels.tsa.stattools.pacf()