Statistical Methods#

Difference-in-Differences#

sctrial.stats.did.AggregateFunc#

Supported aggregation functions for grouped feature summaries.

alias of Literal[‘mean’, ‘median’, ‘pct_pos’]

sctrial.stats.did.AggregateMode#

Supported aggregation modes for DiD analysis.

alias of Literal[‘cell’, ‘participant_visit’, ‘participant_visit_celltype’]

class sctrial.stats.did.DiDConfig(aggregate: Literal['cell', 'participant_visit', 'participant_visit_celltype'] = 'participant_visit', layer: str | None = None, standardize: bool = True, agg: Literal['mean', 'median', 'pct_pos'] = 'mean', covariates: list[str] | None = None, use_bootstrap: bool = False, n_boot: int = 999, seed: int = 42, exclude_crossovers: bool = True)[source]#

Bases: object

Configuration for DiD analysis.

Use this dataclass to bundle common DiD settings and pass to did_table.

agg: Literal['mean', 'median', 'pct_pos'] = 'mean'#: Aggregation function applied after grouping.

aggregate: Literal['cell', 'participant_visit', 'participant_visit_celltype'] = 'participant_visit'#: Aggregation mode for features (cell-level or participant-level).

covariates: list[str] | None = None#: Optional covariate columns to include as fixed effects.

exclude_crossovers: bool = True#: Whether to drop crossover cells if crossover_col is set.

layer: str | None = None#: Expression layer to use for genes (None uses adata.X).

n_boot: int = 999#: Number of bootstrap permutations.

seed: int = 42#: Random seed for bootstrap reproducibility.

standardize: bool = True#: Whether to z-score outcomes before model fitting.

use_bootstrap: bool = False#: If True, use wild cluster bootstrap for p-values.

class sctrial.stats.did.DidFitResult[source]#

Bases: TypedDict

Structured return for did_fit.

sctrial.stats.did.did_fit(df: DataFrame, y: str, unit: str, time: str, arm_bin: str, covariates: list[str] | None = None, cov_type: str = 'cluster', standardize: bool = True, use_bootstrap: bool = False, n_boot: int = 999, seed: int = 42) → DidFitResult[source]#

Fit fixed-effects Difference-in-Differences (DiD) model.

Mathematical Model

The DiD model with participant fixed effects:

\[Y_{it} = \alpha_i + \beta_1 \cdot \text{Post}_t + \beta_2 \cdot (\text{Treat}_i \times \text{Post}_t) + \epsilon_{it}\]

where \(Y_{it}\) is the outcome for participant i at time t, \(\alpha_i\) is a participant-specific intercept (fixed effect), \(\text{Post}_t\) is an indicator for follow-up visit, \(\text{Treat}_i\) is the treatment arm indicator, and \(\beta_2\) is the DiD coefficient (causal estimand of interest).

Null Hypothesis

H₀: β₂ = 0 (no differential treatment effect over time).

The DiD estimator:

\[\hat{\beta}_2 = (\bar{Y}_{T,post} - \bar{Y}_{T,pre}) - (\bar{Y}_{C,post} - \bar{Y}_{C,pre})\]

Statistical Assumptions

Parallel trends: In absence of treatment, both groups would follow same trajectory. Cannot be tested directly but can check pre-trends.
No anticipation: Treatment effect only after treatment starts.
SUTVA: No spillover between participants.
Requires at least 4 unique units (participants) to estimate fixed effects. Returns NaN for all estimates if n_units < 4.
Features with near-zero variance (std < 1e-8) return NaN.
Cluster-robust standard errors account for within-participant correlation.
If n_cells column is present, Weighted Least Squares (WLS) is used. Weights are proportional to n_cells (inverse-variance weighting for participant-level means, since Var(mean) = σ²/n).

Parameters:

df (DataFrame) – Long-format data with columns for unit, time, arm_bin, y, and covariates.
y (str) – Name of the outcome column.
unit (str) – Name of the participant/unit column.
time (str) – Name of the time variable (numeric 0/1).
arm_bin (str) – Name of the treatment indicator column (0/1).
covariates (list of str, optional) – Additional covariate columns to include as fixed effects.
cov_type (str) – Covariance type for standard errors (‘cluster’ recommended).
standardize (bool) – If True, z-score the outcome before fitting (recommended for interpretable effect sizes).
use_bootstrap (bool) – If True, use Wild Cluster Bootstrap for p-values (recommended when n_participants < 15).
n_boot (int) – Number of bootstrap iterations (999 or 1999 for publication).
seed (int) – Random seed for reproducibility.

Returns:

Dictionary with keys beta_DiD, se_DiD, p_DiD, beta_time, p_time, n_units, cov_type_used. When use_bootstrap=True, also includes p_DiD_boot, se_DiD_boot, ci_lo_boot, ci_hi_boot.

Return type:

DidFitResult

sctrial.stats.did.did_table(adata: AnnData, features: Sequence[str], design: TrialDesign, visits: tuple[str, str], exclude_crossovers: bool = True, celltype: str | None = None, aggregate: Literal['cell', 'participant_visit', 'participant_visit_celltype'] = 'participant_visit', layer: str | None = None, standardize: bool = True, agg: Literal['mean', 'median', 'pct_pos'] = 'mean', covariates: list[str] | None = None, use_bootstrap: bool = False, n_boot: int = 999, seed: int = 42, config: DiDConfig | None = None) → DataFrame[source]#

Run Difference-in-Differences (DiD) for a list of features.

This function implements a fixed-effects DiD model to test for treatment-induced longitudinal changes. It is optimized for ‘panels’ of features (tens to hundreds), such as module scores or selected gene sets.

Statistical Model:: y ~ visit + visit:arm + covariates + C(participant) The ‘visit:arm’ interaction term coefficient (beta_DiD) is the primary estimand.

Parameters:

adata – AnnData object containing expression data and metadata.
features – List of features to test. Can be gene names in adata.var_names or observation-level scores in adata.obs.columns.
design – A TrialDesign object specifying the metadata columns.
visits – A tuple of (baseline, followup) visit labels.
exclude_crossovers – If True, excludes observations where design.crossover_col is True. Recommended for primary randomized analysis.
config – Optional DiDConfig object. If provided, its values override the corresponding keyword arguments (aggregate, layer, standardize, agg, covariates, bootstrap, seed, and exclude_crossovers).
celltype – If provided, subsets the analysis to a specific cell type.
aggregate –
Aggregation mode:
- ’cell’: Fit model on individual cells (not recommended for p-values, as it treats cells as independent).
- ’participant_visit’: Average features per participant-visit before fitting. This is the recommended approach for clinical inference.
- ’participant_visit_celltype’: Average per participant-visit-celltype.
layer – Layer to extract gene expression from. If None, uses adata.X.
standardize – If True, z-scores the outcome variable before fitting to provide standardized effect sizes.
agg – Aggregation function: ‘mean’, ‘median’, or ‘pct_pos’.
covariates – List of additional columns in adata.obs to include as fixed effects in the model (e.g., [‘age’, ‘sex’, ‘batch’]). Covariates must be numeric or constant within each participant-visit group. Non-numeric covariates are aggregated with “first” and will raise an error if they vary within a participant-visit (or participant-visit-celltype).
use_bootstrap – If True, uses Wild Cluster Bootstrap to calculate p-values. Recommended for small sample sizes (e.g. < 15 participants per group).
n_boot – Number of bootstrap permutations.
seed – Random seed for bootstrap.

Returns:

Table with one row per feature containing beta_DiD, p_DiD, and FDR-corrected significance.

Return type:

pd.DataFrame

Examples

>>> res = did_table(adata, features=["ms_OXPHOS"], design=design, visits=("V1", "V2"))
>>> print(res[["feature", "beta_DiD", "p_DiD"]])

sctrial.stats.did.did_table_by_celltype(adata: AnnData, features: Sequence[str], design: TrialDesign, visits: tuple[str, str], celltypes: Sequence[str] | None = None, exclude_crossovers: bool = True, aggregate: Literal['cell', 'participant_visit', 'participant_visit_celltype'] = 'participant_visit', layer: str | None = None, standardize: bool = True, agg: Literal['mean', 'median', 'pct_pos'] = 'mean', covariates: list[str] | None = None, use_bootstrap: bool = False, n_boot: int = 999, seed: int = 42) → DataFrame[source]#

Run did_table stratified by cell type.

Parameters:

adata – AnnData object.
features – Genes or module scores to test.
design – A TrialDesign object.
visits – Two visit labels for comparison.
celltypes – Subset of cell types to analyze. If None, uses all in design.celltype_col.
exclude_crossovers – Exclude cells marked as crossovers.
aggregate – Level of aggregation.
layer – Expression layer.
standardize – Standardize expression to unit variance.
agg – Aggregation function.
covariates – Obs columns to include as covariates.
use_bootstrap – Use Wild Cluster Bootstrap for p-values.
n_boot – Number of bootstrap iterations.
seed – Random seed.

Returns:

Table with DiD results for each gene and cell type.

Return type:

pd.DataFrame

sctrial.stats.did.did_table_parallel(adata: AnnData, features: Sequence[str], design: TrialDesign, visits: tuple[str, str], exclude_crossovers: bool = True, celltype: str | None = None, aggregate: Literal['cell', 'participant_visit', 'participant_visit_celltype'] = 'participant_visit', layer: str | None = None, standardize: bool = True, agg: Literal['mean', 'median', 'pct_pos'] = 'mean', covariates: list[str] | None = None, use_bootstrap: bool = False, n_boot: int = 999, seed: int = 42, n_jobs: int = -1, backend: str = 'loky', batch_size: int | None = None, config: DiDConfig | None = None) → DataFrame[source]#

Parallelized DiD across features using joblib.

This mirrors did_table but parallelizes feature-level model fits. It is most useful for large feature panels (hundreds to thousands of genes).

Parameters:

n_jobs – Number of parallel jobs (joblib convention). Use -1 for all cores.
backend – Joblib backend (“loky”, “multiprocessing”, or “threading”).
batch_size – Optional joblib batch size. If None, joblib chooses.

sctrial.stats.did.get_did_aggregated_df(adata: AnnData, features: Sequence[str], design: TrialDesign, visits: tuple[str, str], *, layer: str | None = None, exclude_crossovers: bool = True, celltype: str | None = None, aggregate: Literal['cell', 'participant_visit', 'participant_visit_celltype'] = 'participant_visit', agg: Literal['mean', 'median', 'pct_pos'] = 'mean', covariates: list[str] | None = None) → tuple[DataFrame, str, str, str][source]#

Build aggregated DataFrame for DiD, for use in permutation tests.

Returns (df_use, unit, time, arm_bin) where df_use has one row per participant-visit with feature values and arm_bin. Permute arm_bin at participant level and call did_fit for each permutation. :param adata: AnnData object. :param features: List of genes or module scores. :param design: A TrialDesign object. :param visits: Tuple of (pre, post) visit labels. :param layer: Layer to use for gene expression. :param exclude_crossovers: Whether to drop crossover cells if crossover_col is set. :param celltype: The cell type to analyze. :param aggregate: Aggregation mode (see did_table). :param agg: Aggregation function. :param covariates: Optional covariate columns to include as fixed effects. Non-numeric

covariates must be constant within participant-visit.

Returns:: Tuple containing the aggregated DataFrame, the unit column name, the time column name, and the arm_bin column name.
Return type:: tuple[pd.DataFrame, str, str, str]

Bayesian DiD#

sctrial.stats.bayes.did_table_bayes(adata: AnnData, features: Sequence[str], design: TrialDesign, visits: tuple[str, str], *, exclude_crossovers: bool = True, celltype: str | None = None, aggregate: Literal['cell', 'participant_visit', 'participant_visit_celltype'] = 'participant_visit', layer: str | None = None, standardize: bool = True, agg: Literal['mean', 'median', 'pct_pos'] = 'mean', covariates: list[str] | None = None, prior_scale: float = 1.0, sigma_scale: float = 1.0, draws: int = 1000, tune: int = 1000, chains: int = 4, target_accept: float = 0.9, max_treedepth: int = 12, seed: int = 42) → DataFrame[source]#

Bayesian DiD with participant random intercepts.

Fits a simple hierarchical model:

y_ijt = alpha_i + beta_time * time + beta_arm * arm + beta_did * time*arm + X_cov * beta_cov + eps

Prior specification#

alpha_i ~ Normal(0, prior_scale) (participant random intercepts)
beta_time ~ Normal(0, prior_scale)
beta_arm ~ Normal(0, prior_scale)
beta_did ~ Normal(0, prior_scale)
sigma ~ HalfNormal(sigma_scale)

These weakly-informative priors are designed to stabilize estimation for small samples while remaining agnostic about effect direction. Use prior_scale and sigma_scale to widen or tighten them.

param adata:: AnnData object containing expression data.
param features:: Feature names (genes or obs columns) to test.
param design:: A TrialDesign specifying participant, visit, and arm columns.
param visits:: Tuple of (baseline, followup) visit labels.
param exclude_crossovers:: Whether to exclude crossover participants.
param celltype:: If provided, subset to this cell type before analysis.
param aggregate:: Aggregation mode ("participant_visit" or "cell").
param layer:: Expression layer for gene features (None uses adata.X).
param standardize:: Whether to z-score outcomes before model fitting.
param agg:: Aggregation function ("mean", "median", or "pct_pos").
param covariates:: Optional covariate columns to include as fixed effects.
param prior_scale:: Scale (standard deviation) for all Normal priors on coefficients and random intercepts. Increase to make priors more diffuse.
param sigma_scale:: Scale for the HalfNormal prior on the observation noise sigma.
param draws:: Number of posterior draws (after tuning).
param tune:: Number of tuning (warm-up) iterations per chain.
param chains:: Number of MCMC chains.
param target_accept:: Target acceptance rate for the NUTS sampler.
param max_treedepth:: Maximum tree depth for the NUTS sampler.
param seed:: Random seed for reproducibility.
returns:: Columns: - feature: Feature name - beta_DiD: Posterior mean of the DiD effect - ci_low: 2.5th percentile of posterior - ci_high: 97.5th percentile of posterior - p_bayes: Two-sided posterior tail probability (not a frequentist p-value) - n_units: Number of paired units used - FDR_bayes: BH-adjusted p_bayes
rtype:: pd.DataFrame

sctrial.stats.bayes.prior_predictive_check(adata: AnnData, features: list[str], design: TrialDesign, visits: tuple[str, str], *, prior_scale: float = 1.0, sigma_scale: float = 1.0, n_samples: int = 500, seed: int = 42) → dict[source]#

Run prior predictive check for the Bayesian DiD model.

Samples from the prior predictive distribution to verify that the chosen priors produce plausible outcome values before fitting the model. This is useful for calibrating prior_scale and sigma_scale.

Parameters:

adata – AnnData object containing expression data.
features – Feature names (genes or obs columns) to check.
design – A TrialDesign specifying participant, visit, and arm columns.
visits – Tuple of (baseline, followup) visit labels.
prior_scale – Scale (standard deviation) for all Normal priors on coefficients and random intercepts.
sigma_scale – Scale for the HalfNormal prior on the observation noise sigma.
n_samples – Number of prior predictive samples to draw.
seed – Random seed for reproducibility.

Returns:

Keys:

prior_predictive : np.ndarray – prior predictive samples for the outcome variable (shape depends on number of observations and n_samples).
observed_range : tuple[float, float] – (min, max) of the observed data across all requested features.
prior_covers_data : bool – whether the 95 % prior predictive interval covers the full observed data range.

Return type:

dict

Within-Arm and Between-Arm Comparisons#

sctrial.stats.comparisons.between_arm_comparison(adata: AnnData, visit: str, features: Sequence[str], design: TrialDesign, aggregate: Literal['cell', 'participant_visit', 'participant_visit_celltype'] = 'participant_visit', layer: str | None = None, agg: Literal['mean', 'median', 'pct_pos'] = 'mean', standardize: bool = True, method: Literal['ols', 'wilcoxon'] = 'ols', covariates: list[str] | None = None) → DataFrame[source]#

Between-arm contrast at a fixed visit.

This function tests if treatment arms differ at a specific visit. This is a cross-sectional comparison (no participant fixed effects).

Parameters:

adata – AnnData object.
visit – The visit label to analyze.
features – List of genes or module scores.
design – A TrialDesign object.
aggregate – Aggregation mode (see did_table).
layer – Layer to use for gene expression.
agg – Aggregation function.
standardize – Whether to z-score the outcome variable (only for ‘ols’).
method –
- ‘ols’: Ordinary Least Squares.
- ’wilcoxon’: Wilcoxon rank-sum test (Mann-Whitney U).
covariates – Optional covariate columns to include as fixed effects for OLS.

Returns:

Table with columns:

feature : Name of the feature.
beta_arm : Effect size (treated − control difference in means).
se_arm : Standard error of the arm coefficient. For OLS this is the analytical SE from the model fit; for Wilcoxon it is the pooled SE of the difference in means (√(s₁²/n₁ + s₂²/n₂)).
ci_lo_arm / ci_hi_arm : 95 % confidence interval bounds. For OLS: beta ± t_crit × se with residual df. For Wilcoxon: beta ± t_crit × se with Welch–Satterthwaite df.
p_arm : P-value for the between-arm comparison.
FDR_arm : Benjamini–Hochberg corrected p-value.
n_units : Number of unique participants.

Return type:

pd.DataFrame

sctrial.stats.comparisons.compare_gene_in_celltype(adata: AnnData, gene: str, celltypes: str | Sequence[str], *, group_col: str, group1: str, group2: str, participant_col: str = 'participant_id', celltype_col: str = 'celltype', layer: str | None = 'counts', log1p: bool = True, expr_threshold: float = 0.0, min_cells_per_patient: int = 10, min_patients_per_group: int = 3) → tuple[dict, DataFrame][source]#

Compare one gene between two groups within specified cell types.

This aggregates expression per participant (avoids pseudoreplication) and tests group differences using Mann-Whitney U on participant-level means.

Parameters:

adata – AnnData object.
gene – Gene name.
celltypes – Cell types to analyze.
group_col – Column name in adata.obs to use for grouping.
group1 – First group to compare.
group2 – Second group to compare.
participant_col – Column name in adata.obs to use for participant IDs.
celltype_col – Column name in adata.obs to use for cell types.
layer – Layer name in adata.layers to use for expression data.
log1p – Whether to log1p the expression data.
expr_threshold – Expression threshold to use for calculating the percentage of expressing cells. This is the minimum expression level to be considered expressing.
min_cells_per_patient – Minimum number of cells per participant to include in the analysis.
min_patients_per_group – Minimum number of participants per group to include in the analysis.

Returns:

A tuple containing a dictionary with the results and a DataFrame with the participant-level summaries - The dictionary contains the results of the comparison. - The DataFrame contains the participant-level summaries.

Return type:

tuple[dict, pd.DataFrame]

sctrial.stats.comparisons.get_within_arm_aggregated_df(adata: AnnData, arm: str, features: Sequence[str], design: TrialDesign, visits: tuple[str, str], *, layer: str | None = None, aggregate: Literal['cell', 'participant_visit', 'participant_visit_celltype'] = 'participant_visit', agg: Literal['mean', 'median', 'pct_pos'] = 'mean', covariates: list[str] | None = None) → tuple[DataFrame, str][source]#

Build aggregated DataFrame for within-arm comparison, for permutation tests.

Returns (df_use, unit) where df_use has one row per participant-visit with feature values and visit_num. Permute visit_num within each participant and run OLS to get null betas.

Parameters:

adata – AnnData object.
arm – The arm to analyze (e.g., design.arm_treated).
features – List of genes or module scores.
design – A TrialDesign object.
visits – Tuple of (pre, post) visit labels.
layer – Layer to use for gene expression.
aggregate – Aggregation mode (see did_table).
agg – Aggregation function.
covariates – Optional covariate columns to include as fixed effects. Non-numeric covariates must be constant within participant-visit.

Returns:

Tuple containing the aggregated DataFrame and the unit column name.

Return type:

tuple[pd.DataFrame, str]

sctrial.stats.comparisons.resolve_gene_name(adata: AnnData, gene_query: str) → str[source]#

Resolve a gene name in var_names, case-insensitive if needed.

Parameters:

adata – AnnData object.
gene_query – Gene name to resolve (case-insensitive).

Returns:

The resolved gene name (exact match or case-insensitive match).

Return type:

str

Raises:

ValueError – If gene_query is not found in adata.var_names or if there are multiple case-insensitive matches.

sctrial.stats.comparisons.within_arm_comparison(adata: AnnData, arm: str, features: Sequence[str], design: TrialDesign, visits: tuple[str, str], aggregate: Literal['cell', 'participant_visit', 'participant_visit_celltype'] = 'participant_visit', layer: str | None = None, agg: Literal['mean', 'median', 'pct_pos'] = 'mean', standardize: bool = True, covariates: list[str] | None = None, use_bootstrap: bool = False, n_boot: int = 999, seed: int = 42) → DataFrame[source]#

Paired within-arm pre->post contrast.

This function tests for longitudinal changes within a single treatment arm using a fixed-effects model (equivalent to a paired t-test but flexible for single-cell data).

Parameters:

adata – AnnData object.
arm – The arm to analyze (e.g., design.arm_treated).
features – List of genes or module scores.
design – A TrialDesign object.
visits – Tuple of (pre, post) visit labels.
aggregate – Aggregation mode (see did_table).
layer – Layer to use for gene expression.
agg – Aggregation function.
standardize – Whether to z-score the outcome variable.
covariates – Optional covariate columns to include as fixed effects. Non-numeric covariates must be constant within participant-visit.
use_bootstrap – Whether to use wild cluster bootstrap for p-values and CIs. Recommended when the number of participants (clusters) is small (< 10). When enabled, bootstrap p-values replace the analytical p-values as the primary p_time column.
n_boot – Number of bootstrap iterations (999 or 1999 for publication).
seed – Random seed for bootstrap reproducibility.

Returns:

Table with columns:

feature : Name of the feature.
beta_time : Estimated pre→post change (visit coefficient).
se_time : Cluster-robust standard error of the time coefficient.
ci_lo_time / ci_hi_time : 95 % confidence interval bounds (beta ± t_crit × se with residual df from the OLS fit).
p_time : P-value for the time coefficient (cluster-robust Wald test; bootstrap if use_bootstrap=True).
n_units : Number of unique participants.
FDR_time : Benjamini–Hochberg corrected p-value.

When use_bootstrap=True, additional columns are included:

p_time_boot : Bootstrap p-value.
se_time_boot : Bootstrap SE from coefficient distribution.
ci_lo_boot / ci_hi_boot : Bootstrap-t 95 % CI.

Return type:

pd.DataFrame

sctrial.stats.comparisons.within_arm_fit_beta(df: DataFrame, feat: str, unit: str, *, standardize: bool = True) → float[source]#

Fit within-arm model and return beta_time. Used for permutation tests.

Parameters:

df – DataFrame with feature values and visit_num.
feat – Name of the feature column to analyze.
unit – Name of the unit column.
standardize – Whether to z-score the outcome variable.

Returns:

The beta_time.

Return type:

float

Cell-Type Abundance#

sctrial.stats.abundance.abundance_did(adata: AnnData, design: TrialDesign, visits: tuple[str, str], exclude_crossovers: bool = True, transform: str = 'arcsin_sqrt', min_units: int = 5, covariates: list[str] | None = None, use_bootstrap: bool = False, n_boot: int = 999, seed: int = 42) → DataFrame[source]#

Test treatment-induced cell-type abundance changes via DiD on proportions.

This function calculates cell-type proportions per participant-visit and fits a DiD model to test for treatment-induced compositional shifts.

Statistical Assumptions#

Requires min_units paired participants (default 5) per cell type.
Requires both treatment arms to be represented among paired participants.
Cell types with no variation in the transformed outcome are skipped.
Uses cluster-robust standard errors (clustered by participant) to account for within-participant correlation across visits.
The arcsin-sqrt transform is variance-stabilizing for proportions and is recommended for compositional data.

param adata:

AnnData object.

param design:

A TrialDesign object. Must have celltype_col defined.

param visits:

Tuple of (baseline, followup) visit labels.

param exclude_crossovers:

Whether to exclude crossover cells.

param transform:

Mathematical transformation for proportions:

‘arcsin_sqrt’: arcsin(sqrt(p)), variance-stabilizing for proportions.
‘logit’: log(p / (1-p)), useful for extreme proportions.
‘none’: use raw proportions (not recommended).

param min_units:

Minimum number of paired participants required for a cell type to be tested. Cell types with fewer paired participants are skipped.

param covariates:

Additional columns in adata.obs to include as fixed effects. Must be constant within participant-visit (e.g., age, sex).

param use_bootstrap:

If True, uses Wild Cluster Bootstrap for p-values. Recommended for small sample sizes (< 15 participants per arm).

param n_boot:

Number of bootstrap permutations.

param seed:

Random seed.

returns:

pd.DataFrame – Table with one row per cell type containing:
- celltype: Cell type name
- n_participants: Number of paired participants
- beta_DiD: Treatment effect (interaction term)
- se_DiD: Cluster-robust standard error
- p_DiD: P-value for the treatment effect
- FDR_DiD: Benjamini-Hochberg FDR-corrected p-value
Interpretation notes
——————–
The arcsin-sqrt transform stabilizes variance but is not on the original
proportion scale. A positive beta_DiD indicates an increase in proportion
in the treated arm relative to control; to interpret effect magnitude in
raw proportions, inspect group-level proportions directly.

Examples

>>> ab_res = abundance_did(adata, design, visits=("V1", "V2"))
>>> print(ab_res)

Effect Sizes#

Effect size calculations for trial-aware inference.

This module provides standardized effect size measures (Cohen’s d, Hedge’s g) and confidence intervals for DiD analyses.

Statistical Background#

Cohen’s d and Hedge’s g are standardized effect size measures that express the difference between groups in standard deviation units, enabling comparison across studies with different scales.

Cohen’s d vs Bootstrap CI: These serve complementary purposes: - Cohen’s d: Standardized effect size for meta-analysis and interpretation - Bootstrap CI: Uncertainty quantification on any estimator

Recommendation: Report BOTH standardized effect sizes (for interpretation/meta-analysis) AND confidence intervals (for statistical inference). Bootstrap CI on Cohen’s d provides the best of both worlds.

Hedge’s g vs Cohen’s d: Hedge’s g applies a small-sample correction factor (J = 1 - 3/(4(n₁+n₂-2)-1) = 1 - 3/(4*df - 1)) that reduces upward bias in small samples. Use Hedge’s g when n < 20 per group; for larger samples they are nearly identical.

Mathematical Definitions#

Cohen’s d for DiD:

d = β_DiD / s_pooled

where s_pooled = sqrt(((n₁-1)s₁² + (n₂-1)s₂²) / (n₁+n₂-2))

Hedge’s g:

g = d × J

where J = 1 - 3/(4(n₁+n₂-2)-1) = 1 - 3/(4*df - 1) (Hedges’ correction factor)

95% CI via noncentral t-distribution:

SE(d) ≈ sqrt((n₁+n₂)/(n₁×n₂) + d²/(2(n₁+n₂-2)))

sctrial.stats.effect_size.add_effect_sizes_to_did(df: DataFrame, beta_col: str = 'beta_DiD', se_col: str = 'se_DiD', n_col: str = 'n_units', resid_sd_col: str = 'resid_sd', method: Literal['cohens_d', 'hedges_g'] = 'hedges_g') → DataFrame[source]#

Add standardized effect sizes to a DiD results DataFrame.

This function computes Cohen’s d or Hedge’s g from the DiD coefficients and adds columns for the effect size and its confidence interval.

When the resid_sd column is present (populated by did_table), effect sizes are computed directly as beta / resid_sd. Otherwise, a balanced-design approximation is used as a fallback.

Parameters:

df – DataFrame with DiD results (from did_table).
beta_col – Name of the coefficient column.
se_col – Name of the standard error column.
n_col – Name of the sample size column.
resid_sd_col – Name of the residual standard deviation column (from OLS/WLS fit).
method – “cohens_d” or “hedges_g” (recommended for small samples).

Returns:

Input DataFrame with added columns: - ‘effect_size’: Cohen’s d or Hedge’s g - ‘effect_size_lower’: Lower 95% CI bound - ‘effect_size_upper’: Upper 95% CI bound - ‘effect_size_interpretation’: “small”, “medium”, “large”

Return type:

pd.DataFrame

Examples

>>> res = did_table(adata, features=genes, design=design, visits=visits)
>>> res = add_effect_sizes_to_did(res)
>>> print(res[["feature", "beta_DiD", "effect_size", "effect_size_interpretation"]])

sctrial.stats.effect_size.cohens_d(group1: ndarray, group2: ndarray, pooled: bool = True) → float[source]#

Calculate Cohen’s d effect size between two groups.

Cohen’s d expresses the difference between group means in standard deviation units:

d = (mean₁ - mean₂) / s_pooled

Parameters:

group1 – First group values (e.g., treatment group).
group2 – Second group values (e.g., control group).
pooled – If True (default), use pooled standard deviation. If False, use only group1’s standard deviation (Glass’s delta).

Returns:

Cohen’s d effect size. Positive values indicate group1 > group2.

Return type:

float

Notes

Interpretation guidelines (Cohen, 1988): - |d| ≈ 0.2: small effect - |d| ≈ 0.5: medium effect - |d| ≈ 0.8: large effect

These are rough guidelines; practical significance depends on context.

sctrial.stats.effect_size.cohens_d_from_did(delta_treated: ndarray, delta_control: ndarray) → float[source]#

Calculate Cohen’s d for DiD from participant-level change scores.

This computes the exact Cohen’s d for the difference in change scores:

d = (mean(Δ_treated) - mean(Δ_control)) / s_pooled

where s_pooled is the pooled SD of participant-level deltas.

Parameters:

delta_treated – Participant-level change scores (post - pre) for the treated arm.
delta_control – Participant-level change scores (post - pre) for the control arm.

Returns:

Cohen’s d effect size based on change-score distributions.

Return type:

float

sctrial.stats.effect_size.cohens_d_from_did_approx(beta_did: float, residual_std: float) → float[source]#

Approximate Cohen’s d from a DiD regression coefficient (deprecated).

This uses the regression residual SD instead of pooled SD of change scores, which can mis-scale effect sizes when designs are imbalanced or heteroscedastic.

Parameters:

beta_did – DiD regression coefficient.
residual_std – Residual standard deviation from the DiD model.

Returns:

Approximate Cohen’s d effect size.

Return type:

float

Warns:

DeprecationWarning – Always raised; use cohens_d_from_did instead.

Notes

Deprecated: use cohens_d_from_did(delta_treated, delta_control) instead.

sctrial.stats.effect_size.effect_size_ci(d: float, n1: int, n2: int, alpha: float = 0.05, method: Literal['nct', 'bootstrap'] = 'nct') → tuple[float, float][source]#

Calculate confidence interval for an effect size.

Parameters:

d – Effect size (Cohen’s d or Hedge’s g).
n1 – Sample size of group 1.
n2 – Sample size of group 2.
alpha – Significance level (default 0.05 for 95% CI).
method –
- “nct”: Wald-type CI using the Hedges & Olkin (1985) large-sample variance approximation with t critical values. Fast and adequate for most sample sizes.
- ”bootstrap”: Not implemented here; use bootstrap_effect_size_ci

Returns:

(lower, upper) confidence interval bounds.

Return type:

tuple[float, float]

Notes

Uses the Hedges & Olkin (1985) variance formula:

SE(d) = √( (n₁+n₂)/(n₁·n₂) + d² / (2·(n₁+n₂-2)) )

with t critical values at df = n₁ + n₂ - 2.

sctrial.stats.effect_size.hedges_g(group1: ndarray, group2: ndarray) → float[source]#

Calculate Hedge’s g effect size (bias-corrected Cohen’s d).

Hedge’s g applies a correction factor J to Cohen’s d that reduces upward bias in small samples:

g = d × J J = 1 - 3/(4(n₁+n₂-2) - 1) (equivalently 1 - 3/(4*df - 1))

Parameters:

group1 – First group values.
group2 – Second group values.

Returns:

Hedge’s g effect size.

Return type:

float

Notes

Uses the exact small-sample correction via the gamma function:

J = Γ(df/2) / (√(df/2) × Γ((df-1)/2))

which is more accurate than the common approximation for small df. Use Hedge’s g when sample sizes are small (n < 20 per group). For larger samples, Cohen’s d and Hedge’s g are nearly identical.

Power Analysis#

Power analysis utilities for trial planning.

This module provides sample size and power calculations for both two-arm DiD and single-arm paired (pre/post) designs in single-cell studies.

Mathematical Background#

Two-arm DiD (treatment × time interaction):

Power = Φ(|δ|/SE - z_{1-α/2}), SE = σ × √(4/n)

n = 4σ²(z_{1-α/2} + z_{1-β})² / δ²

Single-arm paired (within-arm pre→post change):

Power = Φ(|δ|/SE - z_{1-α/2}), SE = σ × √(2/n)

n = 2σ²(z_{1-α/2} + z_{1-β})² / δ²

The paired design has half the variance of DiD (factor 2 vs 4) because it involves one arm instead of two.

For cluster-robust inference with ICC (intraclass correlation):

Design effect = 1 + (m-1)×ICC

where m = average cluster size.

The effective sample size is:: n_eff = n / design_effect

Key Considerations for Single-Cell Studies#

Unit of analysis: Participants, not cells. Power is driven by the number of participants, not cells per participant.
ICC matters: High within-participant correlation (cells from same donor are similar) inflates variance estimates. Account for this in power calculations.
Cell-type stratification: Testing multiple cell types increases multiple testing burden but may reveal cell-type-specific effects.
Effect size uncertainty: Use pilot data or published studies to estimate expected effect sizes. Be conservative.

sctrial.stats.power.design_effect(cluster_size: float, icc: float) → float[source]#

Calculate the design effect for clustered data.

The design effect (DEFF) quantifies how much clustering inflates variance compared to simple random sampling:

DEFF = 1 + (m - 1) × ICC

where m is the average cluster size and ICC is the intraclass correlation coefficient.

Parameters:

cluster_size – Average number of observations per cluster (e.g., cells per participant).
icc – Intraclass correlation coefficient (0 to 1). - ICC ≈ 0: observations within clusters are independent - ICC ≈ 1: observations within clusters are identical

Returns:

Design effect. Values > 1 indicate variance inflation.

Return type:

float

Examples

>>> # 100 cells per participant, ICC=0.05
>>> deff = design_effect(100, 0.05)
>>> print(f"Design effect: {deff:.1f}")
Design effect: 5.95

sctrial.stats.power.effective_sample_size(n_clusters: int, cluster_size: float, icc: float) → float[source]#

Calculate the effective sample size accounting for clustering.

The effective sample size is the equivalent number of independent observations after accounting for within-cluster correlation:

n_eff = (n_clusters × cluster_size) / DEFF

Parameters:

n_clusters – Number of clusters (e.g., participants).
cluster_size – Average observations per cluster.
icc – Intraclass correlation coefficient.

Returns:

Effective sample size.

Return type:

float

Notes

For single-cell studies, the effective sample size is typically close to the number of participants, not the number of cells, when aggregating to participant level.

sctrial.stats.power.power_curve(n_range: tuple[int, int] = (5, 50), effect_size: float = 0.5, sigma: float = 1.0, alpha: float = 0.05, icc: float = 0.0, cluster_size: float = 1.0, n_points: int = 20) → tuple[ndarray, ndarray][source]#

Generate a power curve across sample sizes.

Parameters:

n_range – (min, max) participants per arm.
effect_size – Expected DiD effect size.
sigma – Within-group standard deviation.
alpha – Significance level.
icc – Intraclass correlation.
cluster_size – Average cluster size.
n_points – Number of points to compute.

Returns:

(sample_sizes, powers) arrays.

Return type:

tuple[np.ndarray, np.ndarray]

Examples

>>> ns, powers = power_curve(n_range=(10, 100), effect_size=0.3)
>>> import matplotlib.pyplot as plt
>>> plt.plot(ns, powers)
>>> plt.axhline(0.8, linestyle='--', label='80% power')
>>> plt.xlabel('Participants per arm')
>>> plt.ylabel('Power')
>>> plt.show()

sctrial.stats.power.power_did(n_per_group: int, effect_size: float, sigma: float = 1.0, alpha: float = 0.05, two_sided: bool = True, icc: float = 0.0, cluster_size: float = 1.0) → float[source]#

Calculate statistical power for a DiD analysis.

This function computes the power to detect a given DiD effect (treatment × time interaction) with the specified sample size.

Model:: Y_it = α + β₁×Treat + β₂×Post + β₃×(Treat×Post) + ε

H₀: β₃ = 0 (no differential change) H₁: β₃ ≠ 0 (treatment causes differential change)

Parameters:

n_per_group – Number of participants per arm (assumes balanced design). Total participants = 2 × n_per_group.
effect_size – Expected DiD effect size (β₃) in original units. Can also be Cohen’s d if sigma=1.
sigma – Within-group standard deviation of the outcome.
alpha – Significance level (default 0.05).
two_sided – If True (default), use two-sided test.
icc – Intraclass correlation for cluster adjustment.
cluster_size – Average cluster size (e.g., cells per participant).

Returns:

Statistical power (probability of rejecting H₀ when H₁ is true).

Return type:

float

Examples

>>> # 20 participants per arm, effect = 0.5 SD
>>> pwr = power_did(n_per_group=20, effect_size=0.5, sigma=1.0)
>>> print(f"Power: {pwr:.1%}")
Power: 69.8%

>>> # With clustering (100 cells/participant, ICC=0.05)
>>> pwr = power_did(20, 0.5, icc=0.05, cluster_size=100)
>>> print(f"Power with clustering: {pwr:.1%}")

Notes

For DiD, the variance of the interaction term is approximately:

Var(β₃) = 4σ² / n

where n is the total number of participants (not cells).

sctrial.stats.power.power_paired(n_participants: int, effect_size: float, sigma: float = 1.0, alpha: float = 0.05, two_sided: bool = True) → float[source]#

Calculate statistical power for a paired pre/post comparison.

This function computes the power to detect a given within-arm longitudinal effect (post − pre) with the specified sample size.

Model:: Y_post - Y_pre = δ + ε (paired differences)

H₀: δ = 0 (no change) H₁: δ ≠ 0 (treatment causes change)

Parameters:

n_participants – Number of participants with paired measurements.
effect_size – Expected pre-to-post effect size (δ) in original units. Can also be Cohen’s d if sigma=1.
sigma – Within-group standard deviation of the outcome (not the paired difference SD; the paired difference variance is 2σ²).
alpha – Significance level (default 0.05).
two_sided – If True (default), use two-sided test.

Returns:

Statistical power (probability of rejecting H₀ when H₁ is true).

Return type:

float

Examples

>>> # 6 participants, effect = 0.8 SD
>>> pwr = power_paired(n_participants=6, effect_size=0.8, sigma=1.0)
>>> print(f"Power: {pwr:.1%}")

Notes

For paired pre/post, the variance of the time coefficient is:

Var(β_time) = 2σ² / n

where n is the number of paired participants. This is half the variance of a DiD coefficient (4σ²/n) because there is only one arm instead of two.

sctrial.stats.power.sample_size_did(effect_size: float, sigma: float = 1.0, power: float = 0.8, alpha: float = 0.05, two_sided: bool = True, icc: float = 0.0, cluster_size: float = 1.0) → int[source]#

Calculate required sample size for a DiD analysis.

Determines the number of participants per arm needed to achieve the desired power for detecting a given effect size.

Parameters:

effect_size – Minimum detectable DiD effect size.
sigma – Expected within-group standard deviation.
power – Desired statistical power (default 0.80).
alpha – Significance level (default 0.05).
two_sided – If True (default), use two-sided test.
icc – Intraclass correlation for cluster adjustment.
cluster_size – Average cluster size.

Returns:

Required participants per arm (total = 2 × this value).

Return type:

int

Examples

>>> # Sample size to detect d=0.5 with 80% power
>>> n = sample_size_did(effect_size=0.5, sigma=1.0, power=0.80)
>>> print(f"Need {n} participants per arm ({2*n} total)")

>>> # With multiple testing correction (Bonferroni for 10 tests)
>>> n = sample_size_did(effect_size=0.5, alpha=0.05/10, power=0.80)
>>> print(f"With Bonferroni: {n} per arm")

sctrial.stats.power.sample_size_paired(effect_size: float, sigma: float = 1.0, power: float = 0.8, alpha: float = 0.05, two_sided: bool = True) → int[source]#

Calculate required sample size for a paired pre/post comparison.

Determines the number of participants with paired measurements needed to achieve the desired power.

Parameters:

effect_size – Minimum detectable effect size (must be > 0).
sigma – Expected within-group standard deviation.
power – Desired statistical power (default 0.80).
alpha – Significance level (default 0.05).
two_sided – If True (default), use two-sided test.

Returns:

Required number of paired participants.

Return type:

int

Examples

>>> n = sample_size_paired(effect_size=0.5, sigma=1.0, power=0.80)
>>> print(f"Need {n} paired participants")

sctrial.stats.power.sensitivity_paired(n_participants: int, sigma: float = 1.0, power: float = 0.8, alpha: float = 0.05, two_sided: bool = True) → float[source]#

Calculate the minimum detectable effect for a paired comparison.

Given a fixed sample size, what is the smallest effect that can be detected with the desired power?

Parameters:

n_participants – Number of paired participants.
sigma – Within-group standard deviation.
power – Desired power.
alpha – Significance level.
two_sided – Use two-sided test.

Returns:

Minimum detectable effect size.

Return type:

float

Examples

>>> mde = sensitivity_paired(n_participants=6, power=0.80)
>>> print(f"Minimum detectable effect: {mde:.2f} SD")

Mixed Effects Models#

Mixed effects models for trial-aware single-cell inference.

This module provides linear mixed effects model alternatives to the fixed effects approach used in the standard DiD functions.

Mathematical Background#

Fixed Effects vs Mixed Effects:

Fixed Effects (current did_table approach):

Y_it = α_i + β₁×Post + β₂×Treat×Post + ε_it

α_i: participant-specific intercept (absorbed as dummy variables)
Pros: No distributional assumptions on random effects
Cons: Cannot estimate participant-level variance; loses df

Mixed Effects:

Y_it = (α + u_i) + β₁×Post + β₂×Treat×Post + ε_it u_i ~ N(0, σ²_u) (random intercept) ε_it ~ N(0, σ²_ε)

Pros: Estimates variance components; more efficient when assumptions hold
Cons: Requires distributional assumptions; can be biased if misspecified

When to Use Each:

Use Fixed Effects when: - Participant effects may correlate with treatment assignment - Small number of time points - Interest is purely in within-participant changes

Use Mixed Effects when: - Random sample of participants from a population - Interest in generalizing to new participants - Want to estimate variance components - Large number of clusters with few observations each

Recommendation: For randomized trials, both approaches yield similar treatment effect estimates. Fixed effects is more robust to misspecification. Report both as sensitivity analysis.

Model Specification#

The full DiD mixed model:

Y_ijt = β₀ + β₁×Treat_j + β₂×Post_t + β₃×(Treat×Post)_jt + u_j + ε_ijt

Random effects: u_j ~ N(0, σ²_u) (participant) Residuals: ε_ijt ~ N(0, σ²_ε)

The DiD effect is β₃, same as in fixed effects.

Optional random slopes:

Y_ijt = β₀ + β₁×Treat_j + β₂×Post_t + β₃×(Treat×Post)_jt + u_j + v_j×Post_t + ε_ijt

This allows the time effect to vary by participant.

sctrial.stats.mixed_effects.compare_fixed_vs_mixed(adata: AnnData, features: Sequence[str], design: TrialDesign, visits: tuple[str, str], **kwargs) → DataFrame[source]#

Compare fixed effects and mixed effects DiD results.

This function runs both approaches and returns a combined DataFrame for comparison. This is useful for sensitivity analysis.

Parameters:

adata – AnnData object.
features – Features to test.
design – TrialDesign object.
visits – (baseline, followup) visit labels.
**kwargs – Additional arguments passed to both did_table and did_table_mixed.

Returns:

Combined results with columns: - feature - beta_fixed, se_fixed, p_fixed: Fixed effects results - beta_mixed, se_mixed, p_mixed, icc: Mixed effects results - beta_diff: Difference in estimates - agreement: Whether both methods agree on significance direction

Return type:

pd.DataFrame

Examples

>>> comparison = compare_fixed_vs_mixed(
...     adata, features=genes, design=design, visits=("Pre", "Post")
... )
>>> # Check agreement
>>> print(f"Methods agree: {comparison['agreement'].mean():.0%}")

sctrial.stats.mixed_effects.did_mixed(df: DataFrame, outcome: str, participant_col: str, time_col: str, arm_col: str, arm_treated: str, random_slope: bool = False, covariates: Sequence[str] | None = None, visits: tuple[str, str] | None = None) → dict[source]#

Fit a single DiD mixed effects model.

Model specification:: Y ~ Treat + Post + Treat:Post + (1 | Participant)
Or with random slopes:: Y ~ Treat + Post + Treat:Post + (1 + Post | Participant)

Parameters:

df – DataFrame with outcome and design columns.
outcome – Name of the outcome variable column.
participant_col – Column identifying participants (random effect grouping).
time_col – Column with time/visit indicators (binary: 0=pre, 1=post).
arm_col – Column with treatment arm labels.
arm_treated – Label for the treated arm.
random_slope – If True, include random slope for time.
covariates – Additional covariates to include as fixed effects.

Returns:

Results dictionary with keys: - beta_DiD: DiD coefficient - se_DiD: Standard error - p_DiD: P-value - ci_lower, ci_upper: 95% CI - var_participant: Random intercept variance - var_residual: Residual variance - icc: Intraclass correlation - converged: Model convergence status

Return type:

dict

sctrial.stats.mixed_effects.did_table_mixed(adata: AnnData, features: Sequence[str], design: TrialDesign, visits: tuple[str, str], exclude_crossovers: bool = True, aggregate: Literal['cell', 'participant_visit', 'participant_visit_celltype'] = 'participant_visit', layer: str | None = None, agg: Literal['mean', 'median', 'pct_pos'] = 'mean', standardize: bool = True, random_slope: bool = False, covariates: Sequence[str] | None = None) → DataFrame[source]#

Run DiD analysis using mixed effects models for multiple features.

This is the mixed effects analog of did_table. The key difference is that participant effects are modeled as random draws from a normal distribution rather than fixed parameters.

Model:: Y_it ~ Treat + Post + Treat×Post + (1 | Participant)

H₀: β_{Treat×Post} = 0 (no differential treatment effect)

Parameters:

adata – AnnData object containing expression data.
features – List of features (genes or module scores) to test.
design – TrialDesign object specifying column mappings.
visits – Tuple of (baseline, followup) visit labels.
exclude_crossovers – Whether to exclude crossover participants.
aggregate – Aggregation mode: “participant_visit” or “cell”.
layer – Expression layer to use (None for adata.X).
agg – Aggregation function (“mean”, “median”).
standardize – Whether to z-score outcomes before fitting.
random_slope – Include random slope for time (allows time effect to vary by participant).
covariates – Additional covariates to include.

Returns:

Results with columns: - feature: Feature name - beta_DiD, se_DiD, p_DiD, FDR_DiD: DiD statistics - ci_lower, ci_upper: 95% CI - var_participant, var_residual, icc: Variance components - n_units: Number of participants - converged: Model convergence

Return type:

pd.DataFrame

Examples

>>> res_mixed = did_table_mixed(
...     adata, features=genes, design=design, visits=("Pre", "Post")
... )
>>> print(res_mixed[["feature", "beta_DiD", "icc", "converged"]])

GSEA Integration#

sctrial.stats.gsea.run_gsea_cross_sectional(adata: AnnData, gene_sets: str | dict[str, list[str]], design: TrialDesign, visit: str, layer: str | None = None, rank_by: str = 'tstat', min_units: int = 4, return_obj: bool = False, **kwargs) → DataFrame | Prerank[source]#

Perform GSEA on cross-sectional data using between-arm comparison.

Similar to run_gsea_did but for cross-sectional designs (single visit). Uses between_arm_comparison to rank genes by treatment vs control difference at a fixed visit, then runs gseapy.prerank.

Parameters:

adata – AnnData object containing expression data.
gene_sets – A library name (e.g. ‘KEGG_2021_Human’) or a dictionary mapping pathway names to gene lists.
design – A TrialDesign object.
visit – The visit label to analyze (single timepoint).
layer – Layer to extract gene expression from.
rank_by – Metric for ranking genes: ‘signed_confidence’, ‘beta’, or ‘tstat’.
min_units – Minimum number of participants required per gene.
return_obj – Whether to return the full gseapy object.
**kwargs – Additional parameters passed to gseapy.prerank.

Returns:

Enrichment results or the gseapy result object.

Return type:

pd.DataFrame or gseapy.Prerank

sctrial.stats.gsea.run_gsea_did(adata: AnnData, gene_sets: str | dict[str, list[str]], design: TrialDesign, visits: tuple[str, str], layer: str | None = None, exclude_crossovers: bool = True, celltype: str | None = None, rank_by: str = 'signed_confidence', use_bootstrap: bool = False, n_boot: int = 999, min_units: int = 4, return_obj: bool = False, **kwargs) → DataFrame | Prerank[source]#

Perform Gene Set Enrichment Analysis (GSEA) on trial-aware rankings.

This function calculates Difference-in-Differences (DiD) effect sizes for all genes, ranks them, and performs GSEA using gseapy.prerank. This approach ensures that enriched pathways represent treatment effects rather than baseline differences.

Parameters:

adata – AnnData object containing expression data.
gene_sets – A library name (e.g. ‘KEGG_2021_Human’) or a dictionary mapping pathway names to gene lists.
design – A TrialDesign object.
visits – A tuple of (baseline, followup) visit labels.
layer – Layer to extract gene expression from. Recommended to use a normalized layer (e.g., ‘log1p_cpm’).
exclude_crossovers – Whether to exclude crossover cells from the DiD ranking.
rank_by –
Metric for ranking genes:
- ’signed_confidence’: sign(beta_DiD) * -log10(p_DiD). Highlights genes with high effect and high significance.
- ’beta’: ranks genes solely by the DiD effect size.
- ’tstat’: ranks genes by the t-statistic (beta_DiD / se_DiD).
use_bootstrap – Whether to use Wild Cluster Bootstrap for DiD p-values (used if rank_by is ‘signed_confidence’).
n_boot – Number of bootstrap permutations.
min_units – Minimum number of paired participants required for a gene to be included in the ranking. Genes with fewer participants return NaN and are filtered out before GSEA. Default is 4.
return_obj – Whether to return the full gseapy object. If False (default), returns the results DataFrame (res2d).
**kwargs – Additional parameters passed to gseapy.prerank (e.g., permutation_num, outdir, min_size, max_size).

Returns:

A DataFrame of enrichment results (if return_obj=False) or the gseapy result object.

Return type:

pd.DataFrame or gseapy.Prerank

Examples

>>> res = run_gsea_did(adata, gene_sets="KEGG_2021_Human", design=design, visits=("V1", "V2"))
>>> print(res.head())

sctrial.stats.gsea.run_gsea_did_by_celltype(adata: AnnData, gene_sets: str | dict[str, list[str]], design: TrialDesign, visits: tuple[str, str], celltypes: list[str] | None = None, **kwargs) → dict[str, DataFrame | Prerank][source]#

Run GSEA on DiD rankings separately for each cell type.

Parameters:

adata – AnnData object.
gene_sets – Dictionary of gene-set collections.
design – TrialDesign object.
visits – Tuple of (baseline, followup) visit labels.
celltypes – List of cell types to analyze. If None, uses all unique cell types in design.celltype_col.
**kwargs – Additional parameters passed to gseapy.prerank

Returns:

Dictionary of GSEA results.

Return type:

dict[str, pd.DataFrame | gp.Prerank]

sctrial.stats.gsea.run_gsea_did_multi(adata: AnnData, gene_sets: dict[str, str | dict[str, list[str]]], design: TrialDesign, visits: tuple[str, str], **kwargs) → dict[str, DataFrame | Prerank][source]#

Run GSEA across multiple gene-set collections.

Parameters:

adata – AnnData object.
gene_sets – Dictionary of gene-set collections.
design – TrialDesign object.
visits – Tuple of (baseline, followup) visit labels.
**kwargs – Additional parameters passed to gseapy.prerank (e.g., permutation_num, outdir, min_size, max_size).

Returns:

Dictionary of GSEA results.

Return type:

dict[str, pd.DataFrame | gp.Prerank]

sctrial.stats.gsea.run_gsea_pseudobulk(adata: AnnData, gene_sets: str | dict[str, list[str]], design: TrialDesign, visits: tuple[str, str], *, celltype_col: str | None = None, rank_by: str = 'signed_confidence', min_units: int = 4, return_obj: bool = False, **kwargs) → DataFrame | Prerank | dict[str, Prerank][source]#

Run GSEA using pseudobulk DiD results.

Parameters:

adata – AnnData object.
gene_sets – Dictionary of gene-set collections.
design – TrialDesign object.
visits – Tuple of (baseline, followup) visit labels.
celltype_col – If provided and results contain per-celltype rows, run GSEA separately for each cell type and concatenate results.
rank_by – Metric for ranking genes. One of 'signed_confidence' (sign(beta_DiD) * -log10(p_DiD), default), 'beta' (DiD effect size), or 'tstat' (t-statistic beta_DiD / se_DiD).
min_units – Minimum number of paired participants required for a gene to be included in the ranking.
return_obj – Whether to return the full gseapy object. If False (default), returns the results DataFrame (res2d).
**kwargs – Additional parameters passed to gseapy.prerank (e.g., permutation_num, outdir, min_size, max_size).

Returns:

A DataFrame of enrichment results (if return_obj=False) or the gseapy result object.

Return type:

pd.DataFrame | gp.Prerank | dict[str, gp.Prerank]

sctrial.stats.gsea.run_gsea_within_arm(adata: AnnData, gene_sets: str | dict[str, list[str]], design: TrialDesign, arm: str, visits: tuple[str, str], layer: str | None = None, rank_by: str = 'tstat', min_units: int = 4, return_obj: bool = False, **kwargs) → DataFrame | Prerank[source]#

Perform GSEA on longitudinal data using within-arm comparison.

Similar to run_gsea_did but uses within_arm_comparison instead of DiD. Ranks genes by pre→post change within a single arm, then runs gseapy.prerank.

Parameters:

adata – AnnData object containing expression data.
gene_sets – A library name (e.g. ‘KEGG_2021_Human’) or a dictionary mapping pathway names to gene lists.
design – A TrialDesign object.
arm – The arm to analyze (e.g., design.arm_treated).
visits – Tuple of (baseline, followup) visit labels.
layer – Layer to extract gene expression from.
rank_by – Metric for ranking genes: ‘signed_confidence’, ‘beta’, or ‘tstat’.
min_units – Minimum number of participants required per gene.
return_obj – Whether to return the full gseapy object.
**kwargs – Additional parameters passed to gseapy.prerank.

Returns:

Enrichment results or the gseapy result object.

Return type:

pd.DataFrame or gseapy.Prerank

Pseudobulk Methods#

sctrial.stats.pseudobulk.pseudobulk_did(adata: AnnData, genes: Sequence[str], design: TrialDesign, visits: tuple[str, str], *, celltype_col: str | None = None, counts_layer: str | None = 'counts', log1p: bool = True, min_cells_per_group: int = 5, min_paired: int = 4, use_bootstrap: bool = False, n_boot: int = 999, seed: int = 42) → DataFrame[source]#

Run DiD on pseudobulk expression (participant-level aggregates).

This mirrors subject-level pseudobulk DiD workflows where each participant×visit (optionally per cell type) is one observation.

Parameters:

adata – AnnData object.
genes – List of genes to analyze.
design – TrialDesign object.
visits – Tuple of (baseline, followup) visit labels.
celltype_col – Column name in adata.obs to use for cell types.
counts_layer – Layer name in adata.layers to use for expression data.
log1p – Whether to log1p the expression data.
min_cells_per_group – Minimum number of cells per group to include in the analysis.
min_paired – Minimum number of paired participants to include in the analysis.
use_bootstrap – Whether to use bootstrap to calculate p-values. Recommended for small sample sizes.
n_boot – Number of bootstrap permutations.
seed – Random seed for reproducibility.

Returns:

A DataFrame with the results of the DiD analysis. - feature: Name of the feature. - beta_DiD: Effect size (difference in means between arms). - p_DiD: P-value for the DiD analysis. - FDR_DiD: False Discovery Rate corrected p-value. - n_units: Number of unique units (participants) included in the analysis.

Return type:

pd.DataFrame

sctrial.stats.pseudobulk.pseudobulk_export(adata: AnnData, genes: Sequence[str], design: TrialDesign, *, visits: tuple[str, str] | None = None, celltype_col: str | None = None, counts_layer: str | None = 'counts', min_cells_per_group: int = 1, log1p: bool = True) → AnnData[source]#

Export pseudobulk expression as a new AnnData object.

Parameters:

adata – Input AnnData.
genes – List of genes to include.
design – TrialDesign object.
visits – Optional (baseline, followup) visits to subset.
celltype_col – If provided, aggregate per participant-visit-celltype.
counts_layer – Layer to use for counts (default: “counts”).
min_cells_per_group – Minimum cells per group to include.
log1p – Whether to log1p-transform CPM values.

Returns:

AnnData with pseudobulk expression in .X and group metadata in .obs.

Return type:

AnnData

sctrial.stats.pseudobulk.pseudobulk_expression(adata: AnnData, genes: Sequence[str], groupby: Sequence[str], counts_layer: str | None = 'counts', scale: float = 1000000.0, log1p: bool = True, min_cells_per_group: int = 1, include_n_cells: bool = True) → DataFrame[source]#

Compute pseudobulk log1p-CPM for a gene panel per group.

Parameters:

adata – AnnData object.
genes – List of genes to summarize.
groupby – Columns in adata.obs used for grouping (e.g., participant, visit, cell type).
counts_layer – Layer containing raw counts. Defaults to “counts”.
scale – CPM scale factor (default 1e6).
log1p – If True, apply log1p to CPM.
min_cells_per_group – Minimum cells per group to include.
include_n_cells – If True, include the number of cells per group.

Returns:

DataFrame with pseudobulk expression.

Return type:

pd.DataFrame

sctrial.stats.pseudobulk.pseudobulk_within_arm(adata: AnnData, genes: Sequence[str], participant_col: str, visit_col: str, visits: Sequence[str], celltype_col: str, counts_layer: str | None = 'counts', min_paired: int = 3) → tuple[DataFrame, DataFrame][source]#

Compute within-arm pseudobulk deltas and Wilcoxon tests.

Parameters:

adata – AnnData object.
genes – List of genes to analyze.
participant_col – Column name in adata.obs to use for participant IDs.
visit_col – Column name in adata.obs to use for visit labels.
visits – Tuple of (baseline, followup) visit labels.
celltype_col – Column name in adata.obs to use for cell types.
counts_layer – Layer name in adata.layers to use for expression data.
min_paired – Minimum number of paired participants to include in the analysis.

Returns:

A tuple of (summary_df, delta_long_df).

summary_df columns: celltype, feature, n_units, mean_delta, median_delta, p_time, FDR_time.

delta_long_df columns: celltype, feature, participant_id, delta.

If there are no valid pairs, both DataFrames are empty.

Return type:

tuple[pd.DataFrame, pd.DataFrame]

Module Score Analysis#

sctrial.stats.module_scores.module_score_did_by_pool(pb: DataFrame, design: TrialDesign, visits: tuple[str, str], *, min_paired: int = 2, n_perm: int = 1000, seed: int = 42, fdr_within: str | None = 'module', fdr_global: bool = True, allow_unpaired: bool = False) → DataFrame[source]#

Compute DiD on module scores by pool with permutation p-values.

Parameters:

pb – Output of module_score_pseudobulk().
design – TrialDesign object.
visits – Tuple of (baseline, followup) visit labels.
n_perm – Number of permutations for DiD p-values.
seed – Random seed.
fdr_within – If “module”, FDR is computed within each module across pools. If “pool”, FDR is computed within each pool across modules. If None, global FDR.
fdr_global – Only used when fdr_within is not None. If True (default), an additional FDR_DiD_global column is added that applies BH-FDR correction globally across all tests, mirroring the behaviour when fdr_within=None. This is useful because per-group FDR (FDR_DiD) controls the false discovery rate only within each group and does not control the overall false discovery rate across all tests.
allow_unpaired – If True, fit an unpaired OLS DiD (module_score ~ visit + arm + visit×arm) using all available participant-visit observations. This is a fallback when no paired participants exist and should be interpreted cautiously.

Returns:

One row per (pool, module) with columns: pool, module, mean_delta_treated, mean_delta_control, beta_DiD, p_DiD, p_treated, p_control, n_units, FDR_DiD. When fdr_within is set and fdr_global is True, an additional FDR_DiD_global column contains the globally-corrected q-values.

Return type:

pd.DataFrame

sctrial.stats.module_scores.module_score_pseudobulk(adata: AnnData, module_cols: Sequence[str], design: TrialDesign, visits: tuple[str, str], *, pool_col: str | None = None, pool_map: dict[str, Sequence[str]] | None = None, celltype_col: str | None = None, min_cells_per_group: int = 5, exclude_crossovers: bool = True) → DataFrame[source]#

Build pseudobulk module scores (participant × visit × pool × module).

Parameters:

adata – AnnData object.
module_cols – Columns in adata.obs with module scores.
design – TrialDesign object.
visits – Tuple of (baseline, followup) visit labels.
pool_col – Column in adata.obs to use as pool labels directly.
pool_map – Mapping of pool -> list of celltypes. Used if pool_col is None.
celltype_col – Column for fine cell types (used with pool_map).
min_cells_per_group – Minimum cells per participant×visit×pool to keep.
exclude_crossovers – Whether to exclude crossover cells.

Returns:

Long-format table with columns: participant, visit, arm, pool, module, module_score, and n_cells.

Return type:

pd.DataFrame

sctrial.stats.module_scores.module_score_within_arm_by_pool(pb: DataFrame, design: TrialDesign, visits: tuple[str, str], *, min_paired: int = 3, fdr_within: str | None = 'module') → DataFrame[source]#

Within-arm paired tests on module scores by pool.

Parameters:

pb – Output of module_score_pseudobulk().
design – TrialDesign object.
visits – Tuple of (baseline, followup) visit labels.
min_paired – Minimum paired participants per pool/module.
fdr_within – If “module”, FDR is computed within each module across pools. If “pool”, FDR is computed within each pool across modules. If None, global FDR.

Returns:

One row per (pool, module) with columns: pool, module, mean_delta, p_time, n_units, FDR_time.

Return type:

pd.DataFrame

Time Series Analysis#

Time series analysis for multi-timepoint trials.

This module extends sctrial’s capabilities beyond 2-timepoint DiD to handle longitudinal studies with 3+ timepoints.

Mathematical Background#

For studies with >2 timepoints, several approaches are available:

Generalized DiD (Event Study): Compare each post-treatment timepoint to baseline:

Y_it = α_i + Σ_k β_k × D_it^k + ε_it

where D_it^k = 1 if participant i is treated and at timepoint k.
Linear Trend Interaction: Test if treatment changes the slope over time:

Y_it = α_i + β₁×Time + β₂×Treat×Time + ε_it

H₀: β₂ = 0 (treatment doesn’t change trajectory)
Quadratic/Polynomial Trends: Allow for non-linear trajectories:

Y_it = α_i + β₁×Time + β₂×Time² + β₃×Treat×Time + β₄×Treat×Time² + ε_it
Spline Models: Flexible non-parametric trend modeling using basis splines.

Model Selection Guidance#

3-4 timepoints: Linear trend interaction is usually sufficient
5+ timepoints: Consider polynomial or spline models
Irregular intervals: Use actual time values, not ordinal indices
Treatment onset varies: Use time relative to treatment start

Key Assumptions#

Parallel trends in pre-treatment period (for causal inference)
No anticipation effects
Treatment effect timing is correctly specified
Panel is balanced or missingness is random (MCAR)

sctrial.stats.timeseries.event_study_did(adata: AnnData, features: Sequence[str], design: TrialDesign, visits: Sequence[str], reference_visit: str | None = None, layer: str | None = None, exclude_crossovers: bool = True) → DataFrame[source]#

Run event study DiD comparing each visit to a reference baseline.

This is a generalization of 2-period DiD to multiple periods. For each post-baseline visit, we estimate a separate DiD effect.

Model:: Y_it = α_i + Σ_k (γ_k × Visit_k) + Σ_k (β_k × Treat × Visit_k) + ε_it

β_k captures the DiD effect at visit k relative to baseline.

Parameters:

adata – AnnData object.
features – Features to test.
design – TrialDesign object.
visits – Ordered visit labels.
reference_visit – Baseline visit for comparison. If None, uses first visit.
layer – Expression layer.
exclude_crossovers – Exclude crossover participants.

Returns:

Long-format results with columns: - feature: Feature name - visit: Post-baseline visit - beta_DiD: DiD effect vs reference - se_DiD: Standard error - p_DiD: P-value - n_units: Sample size

Return type:

pd.DataFrame

Examples

>>> # Compare each follow-up to baseline
>>> res = event_study_did(
...     adata, features=genes, design=design,
...     visits=["Baseline", "Week4", "Week8", "Week12"],
...     reference_visit="Baseline"
... )
>>> # Visualize event study plot
>>> for feat in genes[:3]:
...     sub = res[res["feature"] == feat]
...     plt.errorbar(sub["visit"], sub["beta_DiD"], yerr=1.96*sub["se_DiD"])

sctrial.stats.timeseries.polynomial_trend(adata: AnnData, feature: str, design: TrialDesign, visits: Sequence[str], time_values: Sequence[float] | None = None, degree: int = 2, layer: str | None = None) → dict[source]#

Fit polynomial trend model for a single feature.

This function fits a polynomial model and returns detailed diagnostics including predicted trajectories.

Parameters:

adata – AnnData object.
feature – Single feature to model.
design – TrialDesign object.
visits – Ordered visit labels.
time_values – Numeric time values (or None for ordinal).
degree – Polynomial degree (1=linear, 2=quadratic, 3=cubic).
layer – Expression layer.

Returns:

Model results including: - coefficients: All model coefficients - predictions: Predicted values per arm×time - aic, bic: Model fit statistics - residuals: Model residuals

Return type:

dict

sctrial.stats.timeseries.test_parallel_trends(adata: AnnData, features: Sequence[str], design: TrialDesign, pre_visits: Sequence[str], layer: str | None = None) → DataFrame[source]#

Test the parallel trends assumption using pre-treatment data.

A key assumption of DiD is that treatment and control groups would have followed parallel trajectories in the absence of treatment. This can be partially tested using pre-treatment data.

Model:: Y_it = α_i + β₁×Time + β₂×Treat×Time + ε_it (pre-treatment only)

H₀: β₂ = 0 (parallel pre-trends)

A significant β₂ suggests the parallel trends assumption may be violated.

Parameters:

adata – AnnData object.
features – Features to test.
design – TrialDesign object.
pre_visits – List of pre-treatment visits (2+ visits).
layer – Expression layer.

Returns:

Results with: - feature: Feature name - beta_pretrend: Pre-treatment trend difference - p_pretrend: P-value for non-parallel trends - warning: “Violation” if p < 0.10

Return type:

pd.DataFrame

Examples

>>> # Check parallel trends in screening visits
>>> pre_test = test_parallel_trends(
...     adata, features=genes, design=design,
...     pre_visits=["Screen1", "Screen2", "Baseline"]
... )
>>> violations = pre_test[pre_test["warning"] == "Violation"]
>>> print(f"Features with potential violations: {len(violations)}")

sctrial.stats.timeseries.trend_interaction(adata: AnnData, features: Sequence[str], design: TrialDesign, visits: Sequence[str], time_values: Sequence[float] | None = None, model: Literal['linear', 'quadratic', 'cubic'] = 'linear', aggregate: Literal['cell', 'participant_visit', 'participant_visit_celltype'] = 'participant_visit', layer: str | None = None, exclude_crossovers: bool = True) → DataFrame[source]#

Test treatment × time trend interactions for multi-timepoint data.

This function fits a model where treatment modifies the trajectory over time:

Linear model:

Y_it = α_i + β₁×Time + β₂×Treat×Time + ε_it

H₀: β₂ = 0 (treatment doesn’t change the slope)

Quadratic model:

Y_it = α_i + β₁×Time + β₂×Time² + β₃×Treat×Time + β₄×Treat×Time² + ε_it

H₀: β₃ = β₄ = 0 (no treatment effect on trajectory)

Parameters:

adata – AnnData object with longitudinal data.
features – Features to test.
design – TrialDesign object.
visits – Ordered list of visit labels (3+ visits).
time_values – Numeric time values corresponding to visits. If None, uses ordinal indices (0, 1, 2, …).
model – “linear”, “quadratic”, or “cubic”.
aggregate – Aggregation mode.
layer – Expression layer.
exclude_crossovers – Exclude crossover participants.

Returns:

Results with columns: - feature: Feature name - beta_trend: Main time effect (in control) - beta_treat_trend: Treatment × time interaction (linear) - p_treat_trend: P-value for linear interaction - beta_treat_trend2: Treatment × time² interaction (if quadratic) - p_treat_trend2: P-value for quadratic interaction - n_units: Number of participants - n_timepoints: Number of visits

Return type:

pd.DataFrame

Examples

>>> # 4-timepoint study: Day 0, 7, 14, 28
>>> res = trend_interaction(
...     adata, features=genes, design=design,
...     visits=["D0", "D7", "D14", "D28"],
...     time_values=[0, 7, 14, 28],
...     model="linear"
... )
>>> print(res[res["p_treat_trend"] < 0.05])

Cross-Validation#

Cross-validation utilities for effect stability assessment.

This module provides leave-one-out (LOO) and k-fold cross-validation for assessing the robustness of DiD estimates.

Why Cross-Validation for DiD?#

Cross-validation in the DiD context serves different purposes than in predictive modeling:

Influence diagnostics: Identify participants with outsized influence on effect estimates (potential outliers or data quality issues).
Effect stability: Assess how robust the DiD estimate is to the exclusion of individual participants.
Generalizability: Estimate how well the effect might replicate in new samples (though true generalization requires new data).

Leave-One-Out (LOO)#

For each participant i:

Fit DiD model excluding participant i
Record beta_DiD^(-i)
Compare to full-sample beta_DiD

Metrics: - Influence: |beta_DiD - beta_DiD^(-i)| / SE(beta_DiD) - Cook’s D analog for DiD

K-Fold Cross-Validation#

Randomly partition participants into K folds
For each fold k: - Fit DiD on participants NOT in fold k - Record estimate
Report mean, SD, and CI of estimates

Interpretation Guidelines#

High LOO variance: Effect driven by few participants
Consistently signed CV estimates: Robust effect
Estimate changes sign across folds: Unreliable effect

sctrial.stats.cv.cv_summary(cv_results: DataFrame, alpha: float = 0.05) → DataFrame[source]#

Generate a summary of CV results (works with both LOO and k-fold).

Parameters:

cv_results – Output from loo_cv_did or kfold_cv_did.
alpha – Significance level for classification.

Returns:

Summary per feature with: - feature: Feature name - mean_estimate: Full-sample estimate (or mean LOO estimate) - mean_loo: Mean of LOO estimates - std_loo: Standard deviation of LOO estimates - cv: Coefficient of variation (std/mean)

Return type:

pd.DataFrame

sctrial.stats.cv.influence_diagnostics(loo_results: DataFrame, threshold: float = 1.0) → DataFrame[source]#

Summarize influence diagnostics from LOO results.

Parameters:

loo_results – Output from loo_cv_did.
threshold – Influence threshold for flagging (default 1.0 = 1 SE shift).

Returns:

Per-participant influence with columns: - feature: Feature name - excluded: ID of excluded participant - influence: Influence score for this participant - beta_DiD: DiD estimate without this participant - is_influential: Whether influence > threshold

Return type:

pd.DataFrame

sctrial.stats.cv.kfold_cv_did(adata: AnnData, features: Sequence[str], design: TrialDesign, visits: tuple[str, str], k: int = 5, n_repeats: int = 10, seed: int = 42, layer: str | None = None, exclude_crossovers: bool = True) → DataFrame[source]#

K-fold cross-validation for DiD effect stability.

Randomly partitions participants into K folds and estimates DiD using K-1 folds. Repeating this process gives a distribution of estimates that reflects sampling variability.

Parameters:

adata – AnnData object.
features – Features to analyze.
design – TrialDesign object.
visits – (baseline, followup) visits.
k – Number of folds (default 5).
n_repeats – Number of times to repeat CV (default 10).
seed – Random seed.
layer – Expression layer.
exclude_crossovers – Exclude crossover participants.

Returns:

Results with one row per feature: - feature: Feature name - beta_full: Full-sample estimate - beta_cv_mean: Mean CV estimate - beta_cv_sd: SD of CV estimates - beta_cv_lower, beta_cv_upper: 2.5-97.5 percentiles - cv_stability: beta_cv_mean / beta_cv_sd (higher = more stable) - sign_consistency: Proportion of CV estimates with same sign as full

Return type:

pd.DataFrame

Examples

>>> cv = kfold_cv_did(adata, features=genes, design=design, visits=visits, k=5)
>>> # Check stability
>>> stable = cv[cv["sign_consistency"] > 0.9]
>>> print(f"Stable effects: {len(stable)} / {len(cv)}")

sctrial.stats.cv.loo_cv_did(adata: AnnData, features: Sequence[str], design: TrialDesign, visits: tuple[str, str], layer: str | None = None, exclude_crossovers: bool = True, aggregate: Literal['cell', 'participant_visit', 'participant_visit_celltype'] = 'participant_visit', standardize: bool = True) → DataFrame[source]#

Leave-one-out cross-validation for DiD analysis.

For each participant, fits the DiD model without that participant and records the effect estimate. This reveals which participants have the largest influence on the results.

Parameters:

adata – AnnData object.
features – Features to analyze.
design – TrialDesign object.
visits – (baseline, followup) visits.
layer – Expression layer.
exclude_crossovers – Exclude crossover participants.
aggregate – Aggregation mode passed to did_table (default “participant_visit”).
standardize – Whether to z-score the outcome variable before fitting (default True).

Returns:

Long-format results with columns: - feature: Feature name - excluded: ID of excluded participant - beta_DiD: DiD estimate without this participant - se_DiD: Standard error - influence: |beta_full - beta_loo| / SE

Return type:

pd.DataFrame

Examples

>>> loo = loo_cv_did(adata, features=["sig_IFN"], design=design, visits=visits)
>>> # Find influential participants
>>> influential = loo[loo["influence"] > 1.0]
>>> print(f"Influential participants: {influential['excluded'].unique()}")

Heterogeneity#

sctrial.stats.heterogeneity.test_treatment_heterogeneity(adata: AnnData, features: Sequence[str], design: TrialDesign, visits: tuple[str, str], biomarker_col: str, *, threshold: float | None = None, aggregate: Literal['cell', 'participant_visit', 'participant_visit_celltype'] = 'participant_visit', layer: str | None = None, standardize: bool = True, agg: Literal['mean', 'median', 'pct_pos'] = 'mean', covariates: list[str] | None = None, fdr: Literal['feature', 'all'] = 'feature') → DataFrame[source]#

Test treatment effect heterogeneity via a 3-way interaction.

Fits: outcome ~ visit + arm + biomarker + visit:arm + visit:biomarker

arm:biomarker + visit:arm:biomarker + C(participant)

The coefficient of the 3-way interaction (visit:arm:biomarker) tests whether the DiD effect differs by biomarker subgroup.

Parameters:

adata – AnnData object.
features – Features to test (genes or obs columns).
design – TrialDesign object.
visits – (baseline, followup) visit labels.
biomarker_col – Column in adata.obs for stratification.
threshold – If provided, biomarker_high = biomarker > threshold. If None, median split is used for numeric biomarkers. For binary biomarkers (0/1 or True/False), values are used directly.
aggregate – Aggregation mode (default: participant_visit).
layer – Expression layer for genes.
standardize – Whether to z-score outcomes before fitting.
agg – Aggregation function.
covariates – Additional covariates for the model.
fdr – FDR correction mode: per feature or across all results.

Returns:

Table with heterogeneity effects for each feature.

Return type:

pd.DataFrame

Examples

>>> res = test_treatment_heterogeneity(
...     adata,
...     features=["sig_IFN_Response", "sig_Cytotoxicity"],
...     design=design,
...     visits=("Pre", "Post"),
...     biomarker_col="baseline_crp",
...     threshold=5.0,
... )
>>> res[["feature", "beta_heterogeneity", "p_heterogeneity"]].head()

Sensitivity#

sctrial.stats.sensitivity.e_value_rr(estimate: float, *, ci_lower: float | None = None, ci_upper: float | None = None) → tuple[float, float | None][source]#

Compute the E-value for a risk ratio estimate (VanderWeele & Ding).

Ref: VanderWeele, Tyler J., and Peng Ding. “Sensitivity analysis in observational research: introducing the E-value.” Annals of internal medicine 167.4 (2017): 268-274.

Parameters:

estimate – Risk ratio (RR) estimate (>0).
ci_lower – Optional confidence interval bounds on the RR scale.
ci_upper – Optional confidence interval bounds on the RR scale.

Returns:

E-value for the point estimate and for the confidence limit closest to 1.

Return type:

(e_value, e_value_ci)

Survival Analysis#

sctrial.stats.survival.hazard_regression_with_features(adata: AnnData, features: Sequence[str], design: TrialDesign, time_col: str, event_col: str, *, visit: str | None = None, layer: str | None = None, agg: str = 'mean', covariates: Sequence[str] | None = None, standardize: bool = True, fdr: bool = True) → DataFrame[source]#

Cox proportional hazards regression using single-cell features.

This function aggregates features to the participant level and fits a Cox model for each feature, optionally adjusting for covariates.

Parameters:

adata – AnnData object.
features – Features to test (obs columns or gene names in var_names).
design – TrialDesign with participant and visit columns.
time_col – Column in adata.obs containing survival time.
event_col – Column in adata.obs indicating event (1) or censoring (0).
visit – Optional visit label to subset prior to aggregation (e.g., baseline).
layer – Expression layer to use for gene features.
agg – Aggregation function for features (“mean”, “median”, or “pct_pos”).
covariates – Optional covariate columns to include in the Cox model.
standardize – If True, z-score each feature before fitting.
fdr – If True, add FDR correction across all features.

Returns:

One row per feature with hazard ratio (HR), confidence interval, and p-value.

Return type:

pd.DataFrame

Diagnostics#

sctrial.stats.diagnostics.check_did_assumptions(fit: RegressionResultsWrapper, *, return_figures: bool = False) → dict[str, Any][source]#

Run lightweight diagnostics for a fitted DiD model.

Parameters:

fit – Fitted statsmodels RegressionResultsWrapper.
return_figures – If True, include matplotlib figures for residual diagnostics.

Returns:

Dictionary with diagnostic statistics: - bp_pvalue: Breusch-Pagan p-value for heteroscedasticity - jb_pvalue: Jarque-Bera p-value for normality - resid_mean, resid_std - figures (optional): dict of matplotlib Figures

Return type:

dict

Summary Utilities#

sctrial.stats.summary.summarize_did_results(df: DataFrame, alpha: float = 0.05, p_col: str = 'p_DiD', fdr_col: str = 'FDR_DiD', effect_col: str = 'beta_DiD') → str[source]#

Generate a human-readable summary of DiD analysis results.

Parameters:

df – DataFrame returned by did_table or abundance_did.
alpha – Significance threshold.
p_col – Name of the p-value column.
fdr_col – Name of the FDR-corrected p-value column.
effect_col – Name of the effect size (beta) column.

Return type:

Markdown-formatted string summarizing the key findings.

Monte Carlo Simulation#

Monte Carlo simulation engine for DiD method comparison.

Generates synthetic cell-level scRNA-seq-like data with known ground-truth DiD effects, enabling controlled comparison of statistical methods under realistic conditions (variable cell counts, participant heterogeneity, cell-level noise).

sctrial.stats.simulation.run_method_comparison(n_participants: int = 20, n_genes: int = 50, effect_sizes: dict[str, float] | None = None, noise_sd: float = 1.0, n_iterations: int = 200, methods: list[str] | None = None, seed: int = 42, n_cells_per_participant: int = 100, n_jobs: int | None = None, **sim_kwargs) → DataFrame[source]#

Run Monte Carlo comparison of DiD methods on simulated data.

Generates cell-level scRNA-seq data with known ground truth and runs each method on the same datasets. Methods that operate on cell-level data (sctrial DiD, mixed DiD) receive the full AnnData so their internal aggregation + cluster-robust SE pipeline is exercised faithfully. Methods that expect pseudobulk (OLS, Wilcoxon) receive the pre-aggregated participant-visit means.

Iterations are parallelised across CPU cores for speed.

Parameters:

methods (list of str) – Methods to compare. Options: "sctrial_did", "mixed_did", "pseudobulk_ols", "wilcoxon". Default: all four.
n_iterations (int) – Number of simulation repetitions.
n_jobs (int, optional) – Number of parallel workers. Defaults to min(cpu_count, 8). Set to 1 to disable parallelisation.

Returns:

DataFrame with columns – estimated_beta, pvalue, ci_lo, ci_hi

Return type:

iteration, method, gene, true_beta,

sctrial.stats.simulation.simulate_did_data(n_participants: int = 20, n_genes: int = 50, n_cells_per_participant: int = 100, effect_sizes: dict[str, float] | None = None, noise_sd: float = 1.0, baseline_mean: float = 5.0, participant_sd: float = 0.5, time_effect: float = 0.1, seed: int = 42) → dict[source]#

Generate synthetic cell-level DiD data with known ground-truth effects.

Creates cell-level expression data for a two-arm (Treated vs Control), two-timepoint (Pre vs Post) design with participant random intercepts, variable cell counts per participant-visit, and cell-level noise.

The data-generating process mirrors real scRNA-seq clinical trial data:

\[Y_{ijk} = \mu + \alpha_i + \beta_1 \text{Post}_j + \beta_2 (\text{Treat}_i \times \text{Post}_j) + \epsilon_{ijk}\]

where i indexes participants, j indexes visits, k indexes cells, \(\alpha_i \sim N(0, \sigma_{\text{participant}}^2)\), and \(\epsilon_{ijk} \sim N(0, \sigma_{\text{noise}}^2)\).

Parameters:

n_participants (int) – Total participants (split equally between arms).
n_genes (int) – Number of genes to simulate.
n_cells_per_participant (int) – Mean cells per participant-visit. Actual counts are drawn from Poisson(n_cells_per_participant) with a floor of 20.
effect_sizes (dict) – Mapping of gene_name -> true DiD effect (beta_DiD). Genes not in this dict have effect = 0 (null).
noise_sd (float) – Cell-level residual standard deviation.
baseline_mean (float) – Grand mean expression level.
participant_sd (float) – Between-participant standard deviation (random intercept).
time_effect (float) – Main effect of time (Pre->Post shift, same in both arms under null).
seed (int) – Random seed for reproducibility.

Returns:

“adata”AnnData — cell-level expression matrix with obs columns: participant, visit, arm

”pseudobulk” : DataFrame — participant-visit means with n_cells “truth” : dict mapping gene_name -> true_beta_DiD “params” : dict of simulation parameters

Return type:

dict with keys