Workflow and Analysis#
Analysis Interfaces#
High-level analysis helpers for trial-aware inference.
- class sctrial.analysis.DiDAnalyzer(adata: AnnData, design: TrialDesign, results_: DataFrame | None = None)[source]#
Bases:
objectHigh-level interface for DiD analysis with stored results.
- fit(features: list[str], *, visits: tuple[str, str], config: DiDConfig | None = None, celltype: str | None = None) DataFrame[source]#
Run DiD and store results.
Returns a copy of the results DataFrame so that external mutations do not alter the analyzer’s internal state.
- Parameters:
features – List of features to analyze.
visits – Tuple of (baseline, followup) visit labels.
config – Configuration for the DiD analysis. See sctrial.stats.did.DiDConfig for more details.
celltype – Cell type to analyze. If None, all cell types are analyzed.
- Returns:
A copy of the results DataFrame.
- Return type:
pd.DataFrame
Workflow API#
End-to-end trial analysis workflow: TrialWorkflow and convenience pipeline.
- class sctrial.workflow.TrialWorkflow(adata: AnnData, last_result: Any | None = None)[source]#
Bases:
objectFluent API for common sctrial workflows.
This class provides a minimal chainable interface for common tasks: preprocessing, gene scoring, and DiD analysis.
- add_log1p_cpm_layer(counts_layer: str = 'counts') TrialWorkflow[source]#
Add a log1p-CPM layer to the workflow AnnData.
- Parameters:
counts_layer – Layer name in adata.layers to use for counts data.
- Returns:
The workflow object with the new log1p-CPM layer added.
- Return type:
- did_table(features: list[str], *, design: TrialDesign, visits: tuple[str, str], config: DiDConfig | None = None) TrialWorkflow[source]#
Run DiD and store the result on the workflow.
- Parameters:
features – List of feature names to analyze.
design – Trial design object.
visits – Tuple of (baseline, followup) visit labels.
config – Configuration for the DiD analysis. See sctrial.stats.did.DiDConfig for more details. If None, uses the default configuration.
- Returns:
The workflow object with the DiD result stored in
self.last_result.- Return type:
- result() Any[source]#
Return the last result computed in the workflow.
- Returns:
The last result computed in the workflow (e.g., a DataFrame).
- Return type:
Any
- score_gene_sets(gene_sets: dict[str, list[str]], *, layer: str | None = None, method: Literal['zmean', 'mean'] = 'zmean', prefix: str = 'ms_', min_genes: int = 5, overwrite: bool = False) TrialWorkflow[source]#
Score gene sets and store module scores in adata.obs.
- Parameters:
gene_sets – Dictionary mapping set names to lists of gene names. Each value must be a
list(not a bare string). Duplicate gene names within a set are automatically removed.layer – Layer name in adata.layers to use for expression data. If None, uses adata.X. For log1p-CPM workflows, use layer=”log1p_cpm”.
method – Scoring method.
"zmean"(default) z-scores each gene across cells then averages;"mean"uses raw mean expression.prefix – Prefix to add to column names (e.g.,
ms_for module scores).min_genes – Minimum number of genes from the set that must be present in the data. Default is 5.
overwrite – If False (default), skip gene sets that already have a column in
adata.obs.
- Returns:
The workflow object with the gene sets scored and module scores stored in
adata.obs.- Return type:
- sctrial.workflow.workflow(adata: AnnData) TrialWorkflow[source]#
Create a TrialWorkflow for fluent chaining.
- Parameters:
adata – AnnData object with trial data.
- Returns:
A TrialWorkflow object with the AnnData object set.
- Return type:
Convenience Functions#
Convenience functions for quick trial analysis workflows.
- sctrial.convenience.auto_detect_design(adata: AnnData, arm_treated: str | None = None, arm_control: str | None = None) TrialDesign[source]#
Auto-detect trial design from common column naming patterns.
Looks for common patterns in column names:
participant:
participant_id,patient_id,donor_id,subject_id,sample_idvisit:
visit,timepoint,time,day,week,time_pointarm:
arm,treatment_arm,arm_id,treatment,group,conditioncelltype:
celltype,cell_type,cluster,annotation,cell_annotation
Column matching uses exact (case-insensitive) match first, then word-boundary partial match (pattern must appear at a word boundary in the column name, so
"arm"matches"arm_id"but not"farm_id").- Parameters:
adata – AnnData object to analyze.
arm_treated – Optional: specify the label for treated arm. If not provided, the function tries keyword-based detection (e.g. “Treated”, “Drug”, “Active”). Raises if detection fails.
arm_control – Optional: specify the label for control arm. If not provided, the function tries keyword-based detection (e.g. “Control”, “Placebo”). Raises if detection fails.
- Returns:
Detected design (may need manual adjustment).
- Return type:
Examples
>>> design = auto_detect_design(adata) >>> print(f"Detected design: {design}") >>> # Verify the detected design. If arm labels need adjustment, >>> # create a new TrialDesign with corrected values: >>> from dataclasses import replace >>> design = replace(design, arm_treated="Drug_A", arm_control="Placebo")
- Raises:
ValueError – If required columns cannot be detected, or if arm labels cannot be determined (more than 2 arms, only 1 arm, or no keyword match for ambiguous labels).
- sctrial.convenience.quick_did(adata: AnnData, module_scores: dict[str, list[str]], visits: tuple[str, str], participant_col: str = 'participant_id', visit_col: str = 'visit', arm_col: str = 'arm', arm_treated: str = 'Treated', arm_control: str = 'Control', celltype_col: str | None = None, layer: str = 'log1p_cpm', counts_layer: str = 'counts', score_method: Literal['zmean', 'mean'] = 'zmean', min_genes: int = 3, **kwargs) DataFrame[source]#
One-line wrapper for the most common DiD workflow.
This function combines preprocessing, scoring, and DiD analysis in a single call for quick exploration.
- Parameters:
adata – AnnData object with trial data.
module_scores – Dictionary mapping module names to gene lists.
visits – Tuple of (baseline, followup) visit labels. The order matters: the first element is the baseline visit and the second is the follow-up, which determines the sign of the DiD estimate.
participant_col – Column name for participant identifiers.
visit_col – Column name for visit labels.
arm_col – Column name for treatment arm.
arm_treated – Label for treated arm.
arm_control – Label for control arm.
celltype_col – Optional column name for cell types.
layer – Layer name for normalized expression. If not present, will be created.
counts_layer – Layer name for raw counts (used if layer needs to be created).
score_method – Method for gene set scoring (‘mean’ or ‘zmean’).
min_genes – Minimum number of overlapping genes for a module score to be computed. Modules with fewer overlapping genes get NaN scores. Default is 3.
**kwargs – Additional arguments passed to did_table().
- Returns:
DiD results table.
- Return type:
pd.DataFrame
Examples
>>> gene_sets = { ... "OXPHOS": ["COX7A1", "ATP5F1A", "NDUFA1"], ... "Glycolysis": ["PKM", "LDHA", "HK2"] ... } >>> res = quick_did( ... adata, ... module_scores=gene_sets, ... visits=("V1", "V2") ... ) >>> print(res[["feature", "beta_DiD", "p_DiD", "FDR_DiD"]])