Troubleshooting Guide#
This guide covers common issues and their solutions when using sctrial.
Installation Issues#
Import Errors#
Problem: ImportError: No module named 'sctrial'
Solution:
# Ensure sctrial is installed
pip install sctrial
# For development installation
pip install -e .
Problem: ImportError: matplotlib is required for plotting
Solution:
# Install with plotting dependencies
pip install 'sctrial[plots]'
# Or install all optional dependencies
pip install 'sctrial[plots,gsea]'
Data Preparation Issues#
Missing Features Error#
Error: KeyError: Features not found in obs or var_names: ['ms_OXPHOS', 'ms_Glycolysis']
Cause: Module scores haven’t been calculated yet, or feature names have typos.
Solution:
Check that you’ve run
score_gene_sets()before DiD analysis:
import sctrial as st
# First score gene sets
gene_sets = {"OXPHOS": ["COX7A1", "ATP5F1A"]}
adata = st.score_gene_sets(adata, gene_sets, prefix="ms_")
# Then run DiD
res = st.did_table(adata, features=["ms_OXPHOS"], design=design, visits=("V1", "V2"))
Check for typos in feature names:
# List available features
print("Obs columns:", adata.obs.columns.tolist()[:10])
print("Var names:", adata.var_names.tolist()[:10])
Missing Required Columns#
Error: KeyError: Required column 'participant_id' not found in adata.obs
Cause: Column names in your data don’t match the TrialDesign specification.
Solution:
Use
auto_detect_design()to automatically detect column names:
# Auto-detect design
design = st.auto_detect_design(adata)
# Verify and adjust if needed
print(design)
Or manually specify the correct column names:
design = st.TrialDesign(
participant_col="donor_id", # Your actual column name
visit_col="timepoint",
arm_col="treatment",
arm_treated="Drug_A",
arm_control="Placebo"
)
Check your column names:
print(adata.obs.columns.tolist())
Statistical Issues#
Insufficient Participants Warning#
Warning: n_units < 4, returning NaN
Cause: DiD requires at least 4 paired participants for statistical estimation.
Solution:
Check how many paired participants you have:
n_paired = st.count_paired(adata.obs, "visit", ["V1", "V2"])
print(f"Paired participants: {n_paired}")
Verify visit filtering:
# Check visit distribution
print(adata.obs.groupby(["participant_id", "visit"]).size())
If you truly have < 4 participants:
Consider pooling data from multiple similar cohorts
Use within-arm analysis instead:
st.within_arm_comparison()Consult a statistician about alternative approaches
All Results are NaN#
Problem: DiD results show all NaN values
Possible Causes & Solutions:
Zero variance in features:
# Check feature variance
import numpy as np
feature_var = adata.obs["ms_OXPHOS"].var()
print(f"Feature variance: {feature_var}")
# If variance is very low (< 1e-8), feature is constant
# Solution: Remove or investigate why feature has no variation
Insufficient paired participants (see above)
All values in one visit:
# Check visit distribution
print(adata.obs[design.visit_col].value_counts())
# Ensure both visits have data
Non-Significant Results#
Problem: No features reach statistical significance
Not necessarily a problem! This could indicate:
No true treatment effect - This is a valid scientific finding
Insufficient power - Try:
# Use bootstrap for small samples
res = st.did_table(
adata, features=features, design=design,
visits=("V1", "V2"),
use_bootstrap=True, # More robust for small N
n_boot=999 # 999 is usually sufficient; use 9999 for final results
)
High variability - Check effect sizes (beta_DiD) even if p-values are high
Need cell-type stratification:
# Try cell-type-specific analysis
res = st.did_table_by_celltype(
adata, features=features, design=design,
visits=("V1", "V2")
)
Memory Issues#
Out of Memory Errors#
Error: MemoryError or kernel crashes
Solutions:
Use sparse layers (often the biggest win):
import scipy.sparse as sp
# Ensure counts are sparse
if not sp.issparse(adata.layers["counts"]):
adata.layers["counts"] = sp.csr_matrix(adata.layers["counts"])
Analyze fewer features at once (batch processing):
# Process in batches
features = list(adata.var_names)
batch_size = 1000
results = []
for i in range(0, len(features), batch_size):
batch = features[i:i+batch_size]
res = st.did_table(adata, features=batch, design=design, visits=("V1", "V2"))
results.append(res)
full_results = pd.concat(results, ignore_index=True)
Load in backed mode, then subset to genes of interest:
import scanpy as sc
adata = sc.read_h5ad("large_dataset.h5ad", backed='r')
adata = adata[:, genes_of_interest].to_memory()
Use hardware with sufficient memory. Do not subsample or downsample cells — this distorts pseudobulk means, alters cell-type composition, and changes WLS weights. See the Performance Guide guide for recommended hardware by dataset size.
Performance Issues#
Very Slow Analysis#
Problem: Analysis takes too long
Solutions:
Use pseudobulk aggregation (recommended):
# Aggregate to participant-visit level (much faster)
res = st.did_table(
adata, features=features, design=design,
visits=("V1", "V2"),
aggregate="participant_visit" # This is default and fastest
)
Reduce number of features:
# Analyze only module scores, not all genes
gene_sets = {...}
adata = st.score_gene_sets(adata, gene_sets, prefix="ms_")
features = [f"ms_{k}" for k in gene_sets.keys()]
For GSEA, reduce gene universe:
# Pre-filter genes with low expression
sc.pp.filter_genes(adata, min_counts=10)
Plotting Issues#
Matplotlib Backend Errors#
Error: RuntimeError: Invalid DISPLAY variable
Solution:
# Use non-interactive backend
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
Missing Plot Elements#
Problem: Plots appear but are empty or missing data
Solution:
Check data availability:
# Verify feature exists
if "ms_OXPHOS" not in adata.obs.columns:
print("Feature not found!")
Check visit filtering:
# Ensure visits are present
subset = adata[adata.obs[design.visit_col].isin(visits)]
print(f"Cells after filtering: {subset.n_obs}")
Data Validation#
Using the Diagnostic Tool#
Before running analysis, diagnose your data:
# Run diagnostics
diagnostics = st.diagnose_trial_data(adata, design, verbose=True)
# Check warnings
if diagnostics['warnings']:
print("Issues found:")
for w in diagnostics['warnings']:
print(f" - {w}")
# Check recommendations
for r in diagnostics['recommendations']:
print(f" 💡 {r}")
Validate Before Analysis#
# Validate adata structure
issues = st.validate_adata(adata, design, strict=False)
if issues:
print("Validation issues:")
for issue in issues:
print(f" ⚠️ {issue}")
else:
print("✓ Data validated successfully!")
Common Error Messages#
“No numeric features found to analyze”#
Cause: All requested features are either missing or non-numeric.
Solution: Check that features are in adata.var_names or adata.obs.columns and are numeric.
“Covariate varies within participant-visit”#
Cause: You’re using a covariate that changes within participant-visits (e.g., age that increments).
Solution: Use only covariates that are constant within participant-visit groups, or aggregate them appropriately.
Getting Help#
If you’re still stuck:
Check the tutorials:
Browse the Tutorials for end-to-end notebook walkthroughs
Search issues: Check GitHub Issues
Report a bug:
# Include this information import sctrial as st print(f"sctrial version: {st.__version__}") print(f"AnnData shape: {adata.shape}") print(f"Design: {design}") # Copy the full error traceback
Ask for help: Open a new issue on GitHub with:
Minimal reproducible example
Error message and full traceback
Environment information (OS, Python version, package versions)