Benchmark Simulation#

Hierarchical gamma-Poisson simulator for generating realistic scRNA-seq clinical trial data with known ground truth, and orchestration tools for running controlled benchmarks across multiple statistical methods.

Simulator#

class sctrial.benchmark.SimulationConfig(design: ~typing.Literal['two_arm', 'single_arm'] = 'two_arm', n_per_arm: int = 20, n_genes: int = 50, effects: dict[str, float] = <factory>, mean_cells_per_visit: int = 500, cell_count_mode: ~typing.Literal['poisson', 'lognormal'] = 'poisson', cell_count_cv: float = 0.5, arm_ratio: tuple[int, int] | None = None, missing_rate: float = 0.0, dispersion_mode: ~typing.Literal['calibrated', 'fixed', 'extreme'] = 'calibrated', dispersion_fixed: float = 10.0, participant_sd: float = 0.3, baseline_mean: float = 2.0, baseline_sd: float = 1.0, time_effect: float = 0.1, target_library_size: int = 10000, library_size_sd: float = 0.3, seed: int = 42)[source]#

Bases: object

Configuration for a single simulation run.

Parameters:
  • design ({"two_arm", "single_arm"}) – Trial design family.

  • n_per_arm (int) – Participants per arm. For two-arm designs with equal allocation, total participants = 2 × n_per_arm. For single-arm, this IS the total participant count.

  • n_genes (int) – Total genes to simulate.

  • effects (dict[str, float]) – Gene-name → true effect size. Genes not listed have β₂=0 (null).

  • mean_cells_per_visit (int) – Average cells per participant-visit.

  • cell_count_mode ({"poisson", "lognormal"}) – How cell counts vary across participant-visits.

  • cell_count_cv (float) – Coefficient of variation for lognormal cell counts (ignored if poisson).

  • arm_ratio (tuple[int, int] | None) – Ratio of treated:control (e.g., (3, 7) for unbalanced). Only for two-arm. If None, uses equal allocation.

  • missing_rate (float) – Fraction of participants missing their Post visit (0.0 = no attrition).

  • dispersion_mode ({"calibrated", "fixed", "extreme"}) – How gene-specific θ_g is drawn.

  • dispersion_fixed (float) – Fixed θ for all genes (only if dispersion_mode=”fixed”).

  • participant_sd (float) – SD of participant random intercept (σ_participant).

  • baseline_mean (float) – Mean of β₀g across genes (log-scale).

  • baseline_sd (float) – SD of β₀g across genes.

  • time_effect (float) – Global time effect β₁ (applied to all genes).

  • target_library_size (int) – Target library size per cell.

  • library_size_sd (float) – SD of log-library size.

  • seed (int) – Random seed.

sctrial.benchmark.simulate_trial(cfg: SimulationConfig) dict[source]#

Generate a complete simulated trial dataset.

Returns:

“adata” : AnnData — cell-level counts (X = raw counts, obs has metadata) “pseudobulk_means”: DataFrame — participant-visit mean expression (for sctrial/Wilcoxon) “pseudobulk_counts”: DataFrame — participant-visit summed counts (for edgeR/limma/dreamlet) “truth” : dict[str, float] — gene → true interaction effect “config” : SimulationConfig

Return type:

dict with keys

sctrial.benchmark.calibrate_from_real_data(adata_real, layer: str | None = None, count_layer: str | None = None, participant_col: str = 'participant', visit_col: str = 'visit') dict[source]#

Extract distributional parameters from real scRNA-seq data.

Uses raw counts for dispersion/library-size calibration (critical for the NB generative model), and normalized expression for ICC estimation.

Parameters:
  • adata_real (AnnData) – Real dataset.

  • layer (str, optional) – Normalized expression layer (for ICC). If None, uses .X.

  • count_layer (str, optional) – Raw count layer for dispersion and library size calibration. If None, auto-detects: checks for integer .X, then “counts” layer. If no raw counts available, estimates from normalized data.

  • participant_col (str) – Column name for participant IDs.

  • visit_col (str) – Column name for visit/timepoint.

Returns:

“mean_cells_per_visit” : float “cell_count_cv” : float “participant_icc” : float (median across genes) “dispersion_median” : float “baseline_mean” : float (log-scale, on raw counts) “baseline_sd” : float “library_size_mean” : float (log-scale, on raw counts) “library_size_sd” : float “has_raw_counts” : bool

Return type:

dict with keys

sctrial.benchmark.validate_simulator(cfg: SimulationConfig, adata_real, layer: str | None = None, participant_col: str = 'participant', visit_col: str = 'visit') dict[source]#

Compare simulated data properties against real data.

Compares on a common scale: if real data is normalized (no raw counts), the simulated raw counts are log1p-CPM-normalized before comparison. If real data has raw counts, compares counts directly.

Returns:

“cell_level” : dict — mean-variance, zero fraction, library size “pseudobulk_level” : dict — ICC, cell-count distribution, mean distribution

Return type:

dict with keys

Orchestrator#

sctrial.benchmark.run_benchmark(designs: list[str] | None = None, methods: list[str] | None = None, n_iterations: int = 200, n_jobs: int = 1, output_dir: str | Path = 'benchmark_results', resume: bool = True) DataFrame[source]#

Run the full NatMeth benchmark grid.

Parameters:
  • designs (list of str) – Design families to run. Default: [“two_arm”, “single_arm”].

  • methods (list of str) – Methods to benchmark. Default: CORE_METHODS.

  • n_iterations (int) – Monte Carlo iterations per scenario.

  • n_jobs (int) – Parallel workers. Use -1 for all cores.

  • output_dir (str or Path) – Directory for output CSVs and figures.

  • resume (bool) – If True, skip scenarios that already have results in output_dir.

Return type:

DataFrame with all results concatenated.

sctrial.benchmark.CORE_METHODS = ['sctrial_did', 'dreamlet', 'nebula', 'wilcoxon_paired']#

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.