heteromotility/hmstats module¶
- Calculate general motility features (average speed, distance, …)
- Calculate features related to mean squared displacement
- Calculate features related to random walk similarity
-
heteromotility.hmstats.average_xy(coor1, coor2)¶ Finds the average value of two XY coordinates
-
heteromotility.hmstats.distance(coor1, coor2)¶ Euclidean distance
-
class
heteromotility.hmstats.GeneralFeatures(cell_ids, move_thresh=5)¶ Bases:
objectCalculates features of speed, distance, path shape, and turning behavior.
-
cell_ids¶ dict – keyed by cell ids, values are lists of coordinate tuples.
-
moving_tau¶ int – threshold for movement
-
tau_range¶ iterable, ints – range of window sizes to use to establish cell direction by regression.
-
interval_range¶ iterable, ints – range of time intervals to use between a cell’s current and ‘next’ direction when calculating turning features.
-
total_distance¶ dict – total distance each cell travels. keyed by cell id, values are floats.
-
net_distance¶ dict – net distance each cell traveled, dist(final_pos - init_pos) keyed by cell id, values are floats.
-
linearity¶ dict – Pearson’s r**2 linearity metric on cell paths. keyed by cell id, values are floats.
-
spearmanrsq¶ dict – Spearman’s rho**2 monotonicity metric on cell paths. keyed by cell id, values are floats.
-
progressivity¶ dict – net_distance/total_distance, metric of directional persistence. keyed by cell id, values are floats.
-
min_speed¶ dict – minimum cell speed. keyed by cell id, values are floats.
-
max_speed¶ dict – maximum cell speed. keyed by cell id, values are floats.
-
avg_moving_speed¶ dict – average moving speed above a certain threshold speed, defined as the movement cutoff. useful to deconfound the cell’s average speed while moving from the dwell time in an immotile state. keyed by threshold, values are dicts keyed by cell id, valued with floats.
-
time_moving¶ dict – proportion of time spent moving above a certain threshold speed, defined as the movement cutoff. keyed by threshold, values are dicts keyed by cell id, valued with floats.
-
turn_stats¶ dict – proportion of the time a cell turns to the right of its past direction. useful to determine directional bias. keyed by tau from tau_range, values are dicts keyed by interval from interval_range. final dicts have float values, [0, 1].
-
theta_stats¶ dict – average turning angle magnitude. keyed by tau from tau_range, values are dicts keyed by interval from interval_range. final dicts have float values, [0, 1].
-
calc_total_distance(cell_ids)¶
-
calc_net_distance(cell_ids)¶
-
calc_linearity(cell_ids)¶
-
calc_spearman(cell_ids)¶
-
calc_progressivity(cell_ids, net_distance, total_distance)¶
-
calc_minmax_speed(cell_ids)¶ Calculate max/min speeds as the distance traveled across 5 frames and append as a float to list max_speed/min_speed Speeds in the unit of pixels per hour
-
calc_moving_stats(cell_ids, move_speed, tau)¶ Calculates the proportion of time a cell spends moving_speeds, min, max, and average speeds while moving
Parameters: - cell_ids (dict keyed by cell id, containing lists of tupled XY coors) –
- move_speed (minimum speed to be considered "moving") –
- tau (time lag to use when determining movement speed) –
-
calc_moving_variable_threshold(cell_ids, thresholds=range(1, 11), tau=5)¶
-
turn_direction(cell_path, tau=10, interval=5)¶ Returns the proportion of time a cell turns left as float 0.0-1.0.
Parameters: - cell_path (list) – timeseries of tuples, each tuple holding an XY position i.e. cell_path = [(x1,y1), (x2,y2)…(xn,yn)]
- tau (int) – desired time lag between a point of interest and a point in the distance (p_n+tau) to determine a cell’s turning behavior. Must be > interval.
- interval (int) – number of points ahead and behind the point of interest to consider when find a regression line to represent the cell’s direction at p_n.
Returns: - turns (list) – binaries reflecting turns left (0) or right (1)
- thetas (list) – angles of each turn in radians
Notes
(1) Estabish direction of object at p_n: Given a time series XY of x,y coordinates, where p_t denotes a point at time t, take a given point p_n determine the ‘direction’ of motion at p_n by plotting a linear reg. on points [p_n-i, p_n+i] for some interval value i This linear regression function is R(x) = slope*x + b
(2) Determine if p_n+tau is left or right For a point p_n+tau, for a variable time lag tau, determine if p_n+tau lies left or right of R
p_n+tau,Y > R(p_n+tau) and initial direction = right : left turn p_n+tau,Y < R(p_n+tau) and initial direction = right : right turn …(3) Calculate magnitude of turn theta Calculate the angle of the turn theta as the angle between
R(x) from (1) R2, line connecting p_n to p_n+tau
v1 = (1, slope of R(x)) v2 = (1, slope of R2)
theta = arccos( v1 dot v2 / ||v1|| * ||v2||)
N.B. If a cell moves in a perfectly straight line, such that point of interest p_n has x coordinate == p_n+tau, the function skips this p_n i.e. Skips if the cell has moved in a perfectly linear line, or circled back on itself
-
all_turn_stats(cell_ids, tau_range=range(9, 12), interval_range=range(5, 7))¶ Parameters: - cell_ids (dict) – keyed by cell id, values are lists of tuple coordinates.
- tau_range (iterable) – range of time lags to try for calculation.
- interval_range (iterable) – range of interval distances to use for calculation of the linregress about point of interest p_n.
Returns: - turn_stats (dict) – keyed by tau, containing dict keyed by interval, containing dict keyed by cell, containing proportion of time a cell turns right
- theta_stats (dict) – keyed by tau, containing dict keyed by interval, containing dict keyed by cell, containing a list of three elements [avg_theta, min_theta, max_theta]
-
-
class
heteromotility.hmstats.MSDFeatures(cell_ids)¶ Bases:
objectCalculates mean squared displacement related features.
-
cell_ids¶ dict – keyed by cell ids, values are lists of coordinate tuples.
-
msd_distributions¶ dict – MSDs for each cell across a range of time lags tau. keyed by cell ids, values are lists of floats.
-
log_distributions¶ dict – log transform of msd_distributions. keyed by cell ids, values are lists of floats.
-
alphas¶ dict – alpha exponent for each cells motion. keyed by cell ids, values are floats.
Notes
MSD(tau) = (1 / tau) * sum( (x(t + tau) + x(t))^2 )
Plotting a distribution of Tau vs MSD will generate a curve. The exponential nature of this curve will describe the type of motion the particle is experiencing.
1 < alpha –> impeded diffusion Linear curve (alpha = 1) –> diffusion 1 < alpha < 2 –> super diffusion alpha = 2 –> ballistic motion
-
calc_msd(path, tau)¶ Calculates mean squared displacement for a cell path for a given time lag tau
Parameters: - path (list) – tuples of sequential XY coordinates.
- tau (int) – time lag to consider when calculating MSD.
Returns: msd – mean squared displacement of path given time lag tau.
Return type: float
-
calc_msd_distribution(path, max_tau)¶ Calculates the distribution of MSDs for a range of time lag values tau.
Parameters: - path (list) – tuples containing sequential XY coordinates.
- max_tau (int) – maximum time lag tau to consider for MSD calculation.
Returns: - distribution (list) – MSDs indexed by tau.
- log_distribution (list) – log transform of distribution.
Notes
If a cell is stationary indefinitely, it effectively has a MSD of 0. However, this calls a math domain error, so checks are in place that ensure the final alpha calculation will be 0 without raising exceptions.
Here – returns both distributions as a string ‘flat’ if MSD calc is 0 for a given range tau
-
msd_dist_all_cells(cell_ids, tau=31)¶ Calculates MSD distributions for all cells in dict cell_ids
Parameters: - cell_ids (dictionary keyed by cell_id, contains lists of) – tuples of sequential XY coordinates
- tau (maximum time lag to calculate for MSD distributions) – n.b. if path < 30 t’s long, uses len(path) as max tau
Returns: - msd_distributions (dict keyed by cell_id containing list) – of MSD distributions, indexed by tau
- log_distributions (dict with log transformed lists of) – MSD distributions
- Notable behavior
- ——————-
- checks for distributions == ‘flat’ string and sets values for
- that cell as ‘flat’, which is checked to provide an alpha of 0
-
calc_alphas(log_distributions)¶ Calculate the alpha coefficient value as the slope of a log(MSD) v log(tau) plot
Parameters: log_distributions (dict) – keyed by cell_id containing lists of log transformed MSDs, indexed by tau Returns: alphas – keyed by cell_id, with values as the alpha coeff from log(MSD(tau)) = alpha*log(tau) Return type: dict Notes
Checkes if log_distributions[cell_id] is == ‘flat’ If so, sets alpha at the appropriate “flat” slope of 0
-
-
class
heteromotility.hmstats.RWFeatures(cell_ids, gf)¶ Bases:
objectCalculates features relative to a random walk and self-similarity measures.
-
cell_ids¶ dict – keyed by cell ids, values are lists of coordinate tuples.
-
gf¶ hmsims.GeneralFeatures object.
-
diff_linearity¶ dict – difference in linearity r**2 between observed cell and a random walk. keyed by cell ids, values are floats.
-
diff_net_dist¶ dict – difference in net distance between an observed cell and a random walk. keyed by cell ids, values are floats.
-
cell_kurtosis¶ dict – measured displacement distribution kurtosis. keyed by cell ids, values are floats.
-
diff_kurtosis¶ dict – difference between cell_kurtosis and a random walk kurtosis. keyed by cell ids, values are floats.
-
nongaussalpha¶ dict – non-Gaussian parameters alpha_2 of the displacement distribution. keyed by cell ids, values are floats.
-
disp_var¶ dict – variance of the displacement distribution. keyed by cell ids, values are floats.
-
hurst_RS¶ dict – Hurst coefficients of the displacement time series, as estimated by Mandelbrot’s original Rescaled Range method. keyed by cell ids, values are floats.
-
autocorr_max_tau¶ int – maximum tau for autocorrelation calculation.
-
autocorr¶ dict – autocorrelation coefficients for displacement time series. keyed by cell ids, values are floats.
-
random_walk(origin, N, speed_mu, speed_sigma=None)¶ Parameters: - origin (starting point for the model) –
- N (number of steps to model) –
- speed_mu (mean speed for the model) –
- speed_sigma (stdev of the normal distribution for step sizes) –
Returns: - model_net_dist (float)
- model_linearity (float)
- model_path (list)
Notes
Net distance of random walks is based on tau = time_unit rate = sigma = step_size = time_unit * velocity_x n = number of steps (can be exchanged for t if =) <x(t)^2> = n * sigma^2 <x(t)^2>^0.5 = sigma * sqrt(n) = step_size * sqrt(n) <(x,y)^2> = <r^2> = <x(t)^2> + <y(t)^2> = 2 * n * sigma^2 <r^2>^0.5 = sqrt(2) * step_size * sqrt(n)
step_size = rate = c = sqrt(x_step^2 + y_step^2) = sqrt(2*step^2) x_step = y_step = rate/sqrt(2) (Random Walks in Biology, 1992, Howard C. Berg)
-
compare_to_randwalk(cell, u, cell_linearity, cell_net_dist, cell_avg_speed)¶ Compares an observed path to a simulated random walk with the same displacement mean
-
run_comparative_randwalk(cell_ids, linearity, net_distance, avg_speed)¶ Runs comparisons to random walks for all cells in cell_ids
-
make_displacement_dist(cell_ids, tau)¶ Generates displacement distributions for all cells in cell_ids
-
kurtosis_comparison(cell_ids, displacements)¶ Compares cell kurtosis to random walk kurtosis
-
vartau_kurtosis_comparison(cell_ids, max_tau=10)¶ Calculate cell_kurtosis and diff_kurtosis for a range of possible time intervals tau.
-
moving_kurt_comparison(cell_ids, displacements, speed)¶ Compare kurtosis only while cell is moving
-
moving_kurt(cell_ids, max_tau=10, moving_range=range(1, 11))¶ Parameters: - cell_ids (dict) – keyed by cell_id, values are lists of corrdinate tuples.
- max_tau (int) – maximum time lag to consider for kurtosis calculations.
- moving_range (iterable) – range of speeds to consider as a threshold for movement.
Returns: cell_moving_kurt (triple dict raw kurtosis of all cells in cell_ids) – keyed by speed threshold, tau, and cell_id cell_moving_kurt = {
- speed1 : {
- tau1 :
{cell_id1 : kurt …}
} …
}
diff_moving_kurt (kurtosis of all cells, normalized by Rayleigh kurt)
structured as above
-
displacement_props(cell_ids)¶ Calculates variance and skewness of the displacement distribution for each cell in cell_ids
Parameters: cell_ids (dict of lists containing tuples of sequential XY coordinates) – Returns: - var (dict keyed by cell_id with variance of displacement distribution)
- skew (dict keyed by cell_ids with skew of displacement distribution)
-
calc_ngaussalpha(X)¶ Calculates the non-Gaussian parameter alpha_2 of a given displacement distribution, X
Parameters: X (array-like) – distribution of displacements. Returns: alpha_2 – non-Gaussian coefficient, floating point values. Return type: float Notes
alpha_2 = <dx^4> / 3*<dx^2>^2 - 1
effectively, a ratio of coeff* (kurtosis / variance)
For a Gaussian distribution, alpha_2 = 0 For non-Gaussian distributions, alpha_2 increases with the length of the tails
Levy-like motion would be expected to have alpha_2 > 0, while diffusive motion would be expected to have alpha_2 == 0
Rayleigh alpha_2 ~= -0.33
-
nongauss_coeff(cell_ids)¶ Calculates non-Gaussian coefficient alpha_2 for all cells in cell_ids.
Parameters: cell_ids (dict) – lists containing tuples of sequential XY coordinates. Returns: nongauss_coeff – keyed by cell_id, nonGauss coeff values. Return type: dict
-
largest_pow2(num)¶ Finds argmax_n 2**n < num
-
rescaled_range(X, n)¶ Finds rescaled range <R(n)/S(n)> for all sub-series of size n in X
-
hurst_mandelbrot(cell_ids)¶ Calculates the Hurst coefficient H for each cell in cell_ids.
Notes
for E[R(n)/S(n)] = Cn**H as n –> inf H : 0.5 - 1 ; long-term positive autocorrelation H : 0.5 ; fractal Brownian motion H : 0-0.5 ; long-term negative autocorrelation
N.B. SEQUENCES MUST BE >= 18 units long, otherwise linear regression for log(R/S) vs log(n) will have < 3 points and cannot be performed
-
dfa_all(cell_ids)¶ Performs detrended fluctuation analysis on cell displacement series.
Parameters: - cell_ids (dict of lists keyed by cell_id) –
- list represents a cell. lists contain sequential tuples (ea.) –
- XY coordinates of a cell at a given timepoint (containing) –
Returns: - dfa_alpha (dict keyed by cell_id, values are the alpha coefficient)
- calculated by detrended fluctuation analysis
References
Ping et. al., Mosaic organization of DNA nucleotides, 1994, Phys Rev E
-
autocorr_all(cell_ids, max_tau=10)¶ Estimates the autocorrelation coefficient for each series of cell displacements over a range of time lags.
Parameters: - cell_ids (dict of lists keyed by cell_id) –
- list represents a cell. lists contain sequential tuples (ea.) –
- XY coordinates of a cell at a given timepoint (containing) –
Returns: - autocorr (dict of lists, containing autocorrelation coeffs for)
- sequential time lags
- qstats (dict of lists containing Q-Statistics (Ljung-Box))
- pvals (dict of lists containing p-vals, as calculated from Q-Statistics)
Notes
Estimation method: https://en.wikipedia.org/wiki/Autocorrelation#Estimation
R(tau) = 1/(n-tau)*sigma**2 [sum(X_t - mu)*(X_t+tau - mu)] | t = [1,n-tau]
X as a time series, mu as the mean of X, sigma**2 as variance of X tau as a given time lag (sometimes referred to as k in literature)
Implementation uses statsmodels.tsa.stattools.acf()
n.b. truncated to taus [1,10], to expand to more time lags, simply alter the indexing being loaded into the return dicts
-
partial_acorr_all(cell_ids, max_tau=10)¶ Estimates the partial autocorrelation coefficient for each series of cell displacements over a range of time lags.
Parameters: - cell_ids (dict) – keyed by cell_id, values are lists of coordinate tuples.
- max_tau (int) – maximum tau to estimate partial autocorrelation.
Returns: partial_acorr – keyed by cell id, values are lists, containing autocorrelation coeffs for sequential time lags
Return type: dict
Notes
n.b. truncated to taus [1,10], to expand to more time lags, simply alter the indexing being loaded into the return dicts. Implementation uses statsmodels.tsa.stattools.pacf()
-