Workflow and Analysis#

Analysis Interfaces#

High-level analysis helpers for trial-aware inference.

class sctrial.analysis.DiDAnalyzer(adata: AnnData, design: TrialDesign, results_: DataFrame | None = None)[source]#

Bases: object

High-level interface for DiD analysis with stored results.

fit(features: list[str], *, visits: tuple[str, str], config: DiDConfig | None = None, celltype: str | None = None) → DataFrame[source]#

Run DiD and store results.

Returns a copy of the results DataFrame so that external mutations do not alter the analyzer’s internal state.

Parameters:

features – List of features to analyze.
visits – Tuple of (baseline, followup) visit labels.
config – Configuration for the DiD analysis. See sctrial.stats.did.DiDConfig for more details.
celltype – Cell type to analyze. If None, all cell types are analyzed.

Returns:

A copy of the results DataFrame.

Return type:

pd.DataFrame

plot_forest(**kwargs)[source]#

Plot a forest plot of the last DiD results.

Parameters:: **kwargs – Additional keyword arguments passed to sctrial.plotting.plot_did_forest.
Returns:: The axes containing the forest plot.
Return type:: matplotlib.axes.Axes

summarize() → str[source]#

Summarize the last DiD results.

Returns:: A summary of the DiD results.
Return type:: str

Workflow API#

End-to-end trial analysis workflow: TrialWorkflow and convenience pipeline.

class sctrial.workflow.TrialWorkflow(adata: AnnData, last_result: Any | None = None)[source]#

Bases: object

Fluent API for common sctrial workflows.

This class provides a minimal chainable interface for common tasks: preprocessing, gene scoring, and DiD analysis.

add_log1p_cpm_layer(counts_layer: str = 'counts') → TrialWorkflow[source]#

Add a log1p-CPM layer to the workflow AnnData.

Parameters:: counts_layer – Layer name in adata.layers to use for counts data.
Returns:: The workflow object with the new log1p-CPM layer added.
Return type:: TrialWorkflow

did_table(features: list[str], *, design: TrialDesign, visits: tuple[str, str], config: DiDConfig | None = None) → TrialWorkflow[source]#

Run DiD and store the result on the workflow.

Parameters:

features – List of feature names to analyze.
design – Trial design object.
visits – Tuple of (baseline, followup) visit labels.
config – Configuration for the DiD analysis. See sctrial.stats.did.DiDConfig for more details. If None, uses the default configuration.

Returns:

The workflow object with the DiD result stored in self.last_result.

Return type:

TrialWorkflow

result() → Any[source]#

Return the last result computed in the workflow.

Returns:: The last result computed in the workflow (e.g., a DataFrame).
Return type:: Any

score_gene_sets(gene_sets: dict[str, list[str]], *, layer: str | None = None, method: Literal['zmean', 'mean'] = 'zmean', prefix: str = 'ms_', min_genes: int = 5, overwrite: bool = False) → TrialWorkflow[source]#

Score gene sets and store module scores in adata.obs.

Parameters:

gene_sets – Dictionary mapping set names to lists of gene names. Each value must be a list (not a bare string). Duplicate gene names within a set are automatically removed.
layer – Layer name in adata.layers to use for expression data. If None, uses adata.X. For log1p-CPM workflows, use layer=”log1p_cpm”.
method – Scoring method. "zmean" (default) z-scores each gene across cells then averages; "mean" uses raw mean expression.
prefix – Prefix to add to column names (e.g., ms_ for module scores).
min_genes – Minimum number of genes from the set that must be present in the data. Default is 5.
overwrite – If False (default), skip gene sets that already have a column in adata.obs.

Returns:

The workflow object with the gene sets scored and module scores stored in adata.obs.

Return type:

TrialWorkflow

sctrial.workflow.workflow(adata: AnnData) → TrialWorkflow[source]#

Create a TrialWorkflow for fluent chaining.

Parameters:: adata – AnnData object with trial data.
Returns:: A TrialWorkflow object with the AnnData object set.
Return type:: TrialWorkflow

Convenience Functions#

Convenience functions for quick trial analysis workflows.

sctrial.convenience.auto_detect_design(adata: AnnData, arm_treated: str | None = None, arm_control: str | None = None) → TrialDesign[source]#

Auto-detect trial design from common column naming patterns.

Looks for common patterns in column names:

participant: participant_id, patient_id, donor_id, subject_id, sample_id
visit: visit, timepoint, time, day, week, time_point
arm: arm, treatment_arm, arm_id, treatment, group, condition
celltype: celltype, cell_type, cluster, annotation, cell_annotation

Column matching uses exact (case-insensitive) match first, then word-boundary partial match (pattern must appear at a word boundary in the column name, so "arm" matches "arm_id" but not "farm_id").

Parameters:

adata – AnnData object to analyze.
arm_treated – Optional: specify the label for treated arm. If not provided, the function tries keyword-based detection (e.g. “Treated”, “Drug”, “Active”). Raises if detection fails.
arm_control – Optional: specify the label for control arm. If not provided, the function tries keyword-based detection (e.g. “Control”, “Placebo”). Raises if detection fails.

Returns:

Detected design (may need manual adjustment).

Return type:

TrialDesign

Examples

>>> design = auto_detect_design(adata)
>>> print(f"Detected design: {design}")
>>> # Verify the detected design. If arm labels need adjustment,
>>> # create a new TrialDesign with corrected values:
>>> from dataclasses import replace
>>> design = replace(design, arm_treated="Drug_A", arm_control="Placebo")

Raises:: ValueError – If required columns cannot be detected, or if arm labels cannot be determined (more than 2 arms, only 1 arm, or no keyword match for ambiguous labels).

sctrial.convenience.quick_did(adata: AnnData, module_scores: dict[str, list[str]], visits: tuple[str, str], participant_col: str = 'participant_id', visit_col: str = 'visit', arm_col: str = 'arm', arm_treated: str = 'Treated', arm_control: str = 'Control', celltype_col: str | None = None, layer: str = 'log1p_cpm', counts_layer: str = 'counts', score_method: Literal['zmean', 'mean'] = 'zmean', min_genes: int = 3, **kwargs) → DataFrame[source]#

One-line wrapper for the most common DiD workflow.

This function combines preprocessing, scoring, and DiD analysis in a single call for quick exploration.

Parameters:

adata – AnnData object with trial data.
module_scores – Dictionary mapping module names to gene lists.
visits – Tuple of (baseline, followup) visit labels. The order matters: the first element is the baseline visit and the second is the follow-up, which determines the sign of the DiD estimate.
participant_col – Column name for participant identifiers.
visit_col – Column name for visit labels.
arm_col – Column name for treatment arm.
arm_treated – Label for treated arm.
arm_control – Label for control arm.
celltype_col – Optional column name for cell types.
layer – Layer name for normalized expression. If not present, will be created.
counts_layer – Layer name for raw counts (used if layer needs to be created).
score_method – Method for gene set scoring (‘mean’ or ‘zmean’).
min_genes – Minimum number of overlapping genes for a module score to be computed. Modules with fewer overlapping genes get NaN scores. Default is 3.
**kwargs – Additional arguments passed to did_table().

Returns:

DiD results table.

Return type:

pd.DataFrame

Examples

>>> gene_sets = {
...     "OXPHOS": ["COX7A1", "ATP5F1A", "NDUFA1"],
...     "Glycolysis": ["PKM", "LDHA", "HK2"]
... }
>>> res = quick_did(
...     adata,
...     module_scores=gene_sets,
...     visits=("V1", "V2")
... )
>>> print(res[["feature", "beta_DiD", "p_DiD", "FDR_DiD"]])