Evaluation Engine¶
The evaluation engine is the backend service that powers the workbench’s
analysis features. It evaluates search performance with positive/negative
control gene sets, computes classification and rank metrics, runs
cross-validation, and enrichment analysis. The workbench UI at /workbench
consumes these endpoints.
flowchart LR
A["Gene Set + Controls"] --> G{targetGeneIds?}
G -- yes --> H["Set Intersection<br/>(no WDK call)"]
G -- no --> B["Run Search on WDK"]
B --> C["Evaluate Controls"]
H --> C
C --> D["Metrics<br/>P/R/F1"]
C --> E["Cross-Validation"]
C --> F["Enrichment"]
style A fill:#2563eb,color:#fff
style G fill:#f59e0b,color:#000
style H fill:#10b981,color:#fff
style C fill:#7c3aed,color:#fff
Evaluation Modes¶
The evaluation engine supports two evaluation modes:
- Gene-ID mode (workbench gene sets):
When
targetGeneIdsis provided in the experiment config, the engine skips WDK search re-execution and evaluates using pure set intersection against the control genes. This is the correct path for workbench gene sets, which already contain materialized gene IDs.- Search re-execution mode (strategy evaluation):
When
targetGeneIdsis absent, the engine runs the WDK search usingsearchNameandparametersfrom the config and evaluates the results against controls. This is the correct path when evaluating a search configuration itself — e.g., when the AI agent builds a strategy and needs to test its performance before the results have been materialized into a gene set.
Important
The benchmark and evaluate panels in the workbench both send
targetGeneIds from the active gene set. This ensures metrics are
computed against the actual gene set contents, not a potentially stale
re-execution of search parameters.
Execution Endpoints¶
Method |
Endpoint |
Description |
|---|---|---|
POST |
|
Create and run a single experiment (SSE) |
POST |
|
Run across multiple organisms (SSE) |
POST |
|
Run against multiple control sets (SSE) |
POST |
|
Seed demo strategies and control sets (SSE) |
Analysis Endpoints¶
Cross-experiment (not scoped to a single experiment):
Method |
Endpoint |
Description |
|---|---|---|
POST |
|
Pairwise gene set overlap (Jaccard, shared/unique genes) |
POST |
|
Compare enrichment results across experiments |
Per-experiment (scoped to {experiment_id}):
Method |
Endpoint |
Description |
|---|---|---|
POST |
|
Run cross-validation on an existing experiment |
POST |
|
Run enrichment analysis |
POST |
|
Re-run evaluation (e.g. after changing controls) |
POST |
|
Custom enrichment request |
POST |
|
Threshold sweep for a parameter |
GET |
|
Download experiment report (HTML) |
CRUD and Results¶
Method |
Endpoint |
Description |
|---|---|---|
GET |
|
List experiments (optional site filter) |
GET |
|
Get one experiment |
PATCH |
|
Update (e.g. name) |
DELETE |
|
Delete an experiment |
Results browsing (per-experiment):
Method |
Endpoint |
Description |
|---|---|---|
GET |
|
List available result attributes |
GET |
|
Paginated result records |
POST |
|
Get single record detail |
GET |
|
Distribution data for an attribute |
POST |
|
Refine/filter result records |
Workbench chat (per-experiment conversational AI):
Method |
Endpoint |
Description |
|---|---|---|
POST |
|
Start workbench chat stream (SSE) |
GET |
|
Get chat message history |
Persistence¶
Experiments are stored in the experiments table (see
veupath_chatbot.persistence.models.ExperimentRow): id, site_id,
name, status, data (full JSON), batch_id, benchmark_id, created_at, updated_at.
The experiment store (veupath_chatbot.services.experiment.store)
keeps an in-memory cache and persists every mutation to PostgreSQL.
Control Sets¶
Reusable positive/negative gene sets are managed at /api/v1/control-sets
(CRUD). They can be referenced when creating experiments (e.g.
control_set_id). See veupath_chatbot.persistence.models.ControlSet.
Experiment Streaming (CQRS)¶
Purpose: Background task launchers for experiment execution using a CQRS event model. Events are persisted to Redis Streams; operations are tracked in PostgreSQL. This is how long-running experiments (single, batch, benchmark) are kicked off and their progress communicated to the frontend via SSE.
Background task launchers for experiment execution — CQRS version.
Events are persisted to Redis Streams. Operations are registered in PostgreSQL.
- async veupath_chatbot.services.experiment.core.streaming.start_experiment(config, *, user_id=None)[source]¶
Launch a single experiment as a background task. Returns operation ID.
- Return type:
Service Layer¶
Core experiment service, orchestration, and store.
Experiment execution orchestrator.
Coordinates the full experiment lifecycle: evaluation, metrics computation, optional cross-validation, and optional enrichment analysis.
Each phase is a private function that mutates experiment and persists
intermediate state to the store. The public run_experiment() function
orchestrates phase sequencing, lifecycle management, and error handling.
- async veupath_chatbot.services.experiment.service.run_experiment(config, *, user_id=None, progress_callback=None)[source]¶
Execute a full experiment and persist the result.
- Parameters:
config (ExperimentConfig) – Experiment configuration.
user_id (str | None) – Owning user ID (for IDOR protection).
progress_callback (Callable[[JSONObject], Awaitable[None]] | None) – Optional async callback for SSE progress events.
- Returns:
Completed experiment with all results.
- Return type:
Experiment store with write-through DB persistence.
Provides CRUD operations for experiment lifecycle management. Keeps an in-memory dict for fast synchronous access during experiment execution, and persists every mutation to PostgreSQL so experiments survive API restarts.
- class veupath_chatbot.services.experiment.store.ExperimentStore[source]¶
Bases:
WriteThruStore[Experiment]Experiment repository with in-memory cache and DB write-through.
Inherits save/get/delete/aget/adelete from WriteThruStore. Adds domain-specific listing methods.
- list_by_benchmark(benchmark_id)[source]¶
Return all experiments belonging to a benchmark suite (in-memory).
- Return type:
- async alist_all(site_id=None, user_id=None)[source]¶
List experiments: merges DB rows with in-memory (fresher) state.
- Return type:
- veupath_chatbot.services.experiment.store.get_experiment_store()[source]¶
Get the global experiment store singleton.
- Return type:
Shared helpers for experiment execution and analysis.
Provides gene-list extraction utilities and the progress callback type alias.
- veupath_chatbot.services.experiment.helpers.ProgressCallback¶
Emits an SSE-friendly progress event dict.
alias of
Callable[[JSONObject],Awaitable[None]]
- veupath_chatbot.services.experiment.helpers.safe_int(val, default=0)[source]¶
Safely convert a value to int, returning default on failure.
- Return type:
- veupath_chatbot.services.experiment.helpers.safe_float(val, default=0.0)[source]¶
Safely convert a value to float, returning default on failure.
Non-finite values (
inf,-inf,nan) are replaced with default because they are not JSON-serializable and PostgreSQL rejects them in JSON columns.- Return type:
- veupath_chatbot.services.experiment.helpers.extract_wdk_id(payload, key='id')[source]¶
Extract an integer ID from a WDK JSON response.
WDK formatters (
StepFormatter,StrategyService, etc.) emit entity IDs as Java longs (alwaysintin JSON) under a known key (typically"id"or"strategyId").
- veupath_chatbot.services.experiment.helpers.coerce_step_id(payload)[source]¶
Extract step ID from a WDK step-creation response.
- Parameters:
payload (JSONObject | None) – WDK step-creation response.
- Returns:
Step ID.
- Raises:
ValueError – If step ID not found.
- Return type:
- async veupath_chatbot.services.experiment.helpers.extract_and_enrich_genes(*, site_id, result, negative_controls=None)[source]¶
Extract gene lists from a control-test result and enrich with WDK metadata.
Single entry point that replaces duplicated extract + enrich blocks.
Deserialize JSON dicts back into Experiment dataclass trees.
Simple sub-types are deserialized via the generic from_json converter.
Only Experiment / ExperimentConfig require hand-written logic due
to conditional field defaults and enrichment deduplication.
- veupath_chatbot.services.experiment._deserialize.experiment_from_json(d)[source]¶
Reconstruct an
Experimentfrom its JSON representation.- Parameters:
- Returns:
Fully hydrated Experiment dataclass.
- Return type:
WDK strategy materialization for experiments.
Creates, persists, and cleans up WDK strategies from experiment configs, including step tree materialization for multi-step and import modes.
- async veupath_chatbot.services.experiment.materialization.cleanup_experiment_strategy(experiment)[source]¶
Delete the persisted WDK strategy when an experiment is deleted.
- Parameters:
experiment (Experiment) – Experiment whose WDK strategy should be cleaned up.
Classification¶
Purpose: Gene record classification by experiment membership (TP/FP/FN/TN).
Adds _classification field to WDK records based on gene ID membership in
positive and negative control sets.
Gene record classification by experiment category membership.
Classifies WDK result records as TP / FP / FN / TN based on whether their gene ID appears in the experiment’s curated gene sets. Handles WDK transcript ID version suffixes (e.g. “GENE.1” -> “GENE”).
- veupath_chatbot.services.experiment.classification.classify_records(records, tp_ids, fp_ids, fn_ids, tn_ids)[source]¶
Add
_classificationfield to records based on gene ID membership.For each record, extracts the primary key and checks membership in the four gene-set categories. WDK transcript IDs may include a version suffix (e.g.
"PF3D7_0100100.1"); the function also checks the base ID with the suffix stripped.- Parameters:
- Returns:
New list of records, each with a
_classificationfield.- Return type:
Evaluation Service¶
Purpose: Re-evaluation and threshold sweep service. Pure business logic for recomputing experiment metrics with updated controls or parameters.
Evaluation service: re-evaluate and threshold sweep.
Pure business logic extracted from the transport handler. No HTTP/SSE concerns here – callers (routers, tools, etc.) wrap the results in whatever transport format they need.
- veupath_chatbot.services.experiment.evaluation.SWEEP_CONCURRENCY = 3¶
Max parallel WDK control-test runs per sweep.
- veupath_chatbot.services.experiment.evaluation.SWEEP_TIMEOUT_S = 240¶
Server-side timeout for the entire sweep.
- veupath_chatbot.services.experiment.evaluation.SWEEP_POINT_TIMEOUT_S = 90¶
Per-point timeout; prevents one slow point from blocking all.
- async veupath_chatbot.services.experiment.evaluation.re_evaluate(exp)[source]¶
Re-run control evaluation against the (possibly modified) strategy.
Updates the experiment in-place (metrics + gene lists) and persists it. Returns the full experiment JSON.
- Return type:
- veupath_chatbot.services.experiment.evaluation.compute_sweep_values(*, sweep_type, values, min_value, max_value, steps)[source]¶
Compute the list of parameter values for a sweep.
- Parameters:
- Returns:
List of stringified sweep values.
- Raises:
ValidationError – On invalid inputs.
- Return type:
- veupath_chatbot.services.experiment.evaluation.validate_sweep_parameter(exp, param_name)[source]¶
Ensure param_name exists in the experiment config.
For single-step experiments, checks
exp.config.parameters. For tree-mode experiments, walks the step tree looking for the parameter in any leaf node’sparametersdict.- Raises:
ValidationError – If the parameter is missing.
- veupath_chatbot.services.experiment.evaluation.format_metrics_dict(m)[source]¶
Format an
ExperimentMetricsinto a JSON-friendly dict.- Return type:
- async veupath_chatbot.services.experiment.evaluation.run_sweep_point(*, exp, param_name, value, is_categorical)[source]¶
Run a single sweep point: modify the parameter and evaluate.
For tree-mode experiments, clones the step tree and injects the swept parameter value into every node that contains it, then calls
run_controls_against_tree(). For single-step experiments, modifies the flat parameter dict and callsrun_positive_negative_controls().- Returns:
Dict with
value,metrics(orNone), and optionallyerror.- Return type:
- async veupath_chatbot.services.experiment.evaluation.cleanup_before_sweep(site_id)[source]¶
Best-effort cleanup of leaked internal control-test strategies.
- async veupath_chatbot.services.experiment.evaluation.generate_sweep_events(*, exp, param_name, sweep_type, sweep_values)[source]¶
Run the full sweep and yield SSE-formatted events.
Yields
sweep_pointevents as each point completes, then a finalsweep_completeevent with all sorted results.- Return type:
Metrics and Evaluation¶
Key Metrics
Where \(TP\) = true positives (returned genes in positive controls), \(FP\) = false positives (returned genes in negative controls), \(FN\) = false negatives (positive control genes not returned).
Classification metrics, rank metrics, and statistical utilities.
Metrics engine for computing exhaustive classification metrics.
Computes all standard binary classification metrics from the raw
intersection counts returned by run_positive_negative_controls().
- veupath_chatbot.services.experiment.metrics.compute_confusion_matrix(*, positive_hits, total_positives, negative_hits, total_negatives)[source]¶
Derive a confusion matrix from control-test intersection counts.
- Parameters:
- Returns:
Populated confusion matrix.
- Return type:
- veupath_chatbot.services.experiment.metrics.compute_metrics(cm, *, total_results=0)[source]¶
Compute all classification metrics from a confusion matrix.
- Parameters:
cm (ConfusionMatrix) – Confusion matrix.
total_results (int) – Total number of results returned by the search.
- Returns:
Full metrics object.
- Return type:
- veupath_chatbot.services.experiment.metrics.evaluate_gene_ids_against_controls(*, gene_ids, positive_controls, negative_controls, site_id='', record_type='')[source]¶
Evaluate a gene set against controls using pure set intersection.
No WDK calls — the gene set already has its results. Returns the same dict shape that
metrics_from_control_result()andextract_and_enrich_genes()consume.- Return type:
- veupath_chatbot.services.experiment.metrics.metrics_from_control_result(result)[source]¶
Build metrics from the dict returned by
run_positive_negative_controls().- Parameters:
result (JSONObject) – Raw control-test result dict.
- Returns:
Full metrics.
- Return type:
Rank-based evaluation metrics (Precision@K, Recall@K, Enrichment@K).
These metrics treat gene lists as ranked outputs rather than binary classifiers, which better matches how researchers use strategy results (“how many known positives are in my top K?”).
- veupath_chatbot.services.experiment.rank_metrics.compute_rank_metrics(result_ids, positive_ids, negative_ids, k_values=None)[source]¶
Compute rank-based metrics from an ordered result list.
All computation is pure Python — no API calls.
- Parameters:
- Returns:
Rank metrics object.
- Return type:
- async veupath_chatbot.services.experiment.rank_metrics.fetch_ordered_result_ids(site_id, step_id, max_results=5000, sort_attribute=None, sort_direction='ASC')[source]¶
Fetch ordered gene IDs from a persisted WDK strategy step.
When sort_attribute is provided the results are sorted by
reportConfig.sortingviaget_step_records(); otherwise the default WDK ordering is used (viaget_step_answer()).
Shared statistical utilities for experiment analysis.
- veupath_chatbot.services.experiment.stats.hypergeometric_log_sf(x, n, k, m)[source]¶
Approximate log survival function for hypergeometric distribution.
Uses a normal approximation of P(X >= x) for speed. Returns 0.0 (i.e. p=1.0) when the observed count is at or below the mean.
Parameters¶
- x:
Number of observed successes.
- n:
Population size (background).
- k:
Number of success states in the population (result set size).
- m:
Number of draws (gene set size).
- Return type:
Analysis Features¶
Cross-validation, enrichment, overlap, comparison, robustness, and reporting.
K-fold cross-validation for overfitting detection.
Splits positive and negative control gene lists into k folds, evaluates each held-out fold, and aggregates metrics to detect overfitting.
- veupath_chatbot.services.experiment.cross_validation.ProgressCallback¶
Async callback(fold_index, total_folds) for progress reporting.
- veupath_chatbot.services.experiment.cross_validation.FoldEvaluator¶
Async callback(holdout_pos, holdout_neg) → control-test result dict.
alias of
Callable[[list[str] |None,list[str] |None],Coroutine[Any,Any,JSONObject]]
- async veupath_chatbot.services.experiment.cross_validation.run_cross_validation(*, site_id, record_type, controls_search_name, controls_param_name, positive_controls, negative_controls, controls_value_format='newline', search_name=None, parameters=None, tree=None, k=5, full_metrics=None, progress_callback=None)[source]¶
Run k-fold cross-validation on control gene lists.
When tree is provided, evaluates each fold against the full strategy tree. Otherwise, evaluates using the single-step search_name + parameters.
- Return type:
Enrichment analysis via WDK step analysis API.
Wraps VEuPathDB’s native GO, pathway, and word enrichment analyses that are available through the step analysis endpoint.
- Plugin names (from
stepAnalysisPlugins.xml): go-enrichment→ GoEnrichmentPluginpathway-enrichment→ PathwaysEnrichmentPluginword-enrichment→ WordEnrichmentPlugin
- GO enrichment parameters (from
GoEnrichmentPlugin.java): goAssociationsOntologies— “Molecular Function” / etc.goEvidenceCodes— evidence code filtergoSubset— GO slim subsetpValueCutoff— p-value thresholdorganism— organism filter
Parameters are fetched from the WDK analysis form defaults so required
fields like organism and pValueCutoff are always populated.
- veupath_chatbot.services.experiment.enrichment.infer_enrichment_type(wdk_analysis_name, params, result)[source]¶
Infer the
EnrichmentAnalysisTypefrom a WDK analysis name.For GO enrichment, uses the
goAssociationsOntologiesparameter or thegoOntologiesfield in the result to determine which GO branch.- Return type:
Literal[‘go_function’, ‘go_component’, ‘go_process’, ‘pathway’, ‘word’]
- veupath_chatbot.services.experiment.enrichment.is_enrichment_analysis(wdk_analysis_name)[source]¶
Return True if the WDK analysis name is an enrichment plugin.
- Return type:
- veupath_chatbot.services.experiment.enrichment.upsert_enrichment_result(results, new)[source]¶
Replace an existing result of the same
analysis_type, or append.Mutates results in-place so callers don’t accumulate duplicate tabs when the same enrichment analysis is re-run.
- veupath_chatbot.services.experiment.enrichment.parse_enrichment_from_raw(wdk_analysis_name, params, result)[source]¶
Parse a raw WDK analysis result into an
EnrichmentResult.Used by the generic
analyses/runendpoint to return structured enrichment data instead of raw JSON.- Return type:
- veupath_chatbot.services.experiment.enrichment.encode_vocab_params(params, form_meta)[source]¶
Encode vocabulary param values as JSON arrays using form metadata.
WDK’s
AbstractEnumParam.convertToTerms()requires allsingle-pick-vocabularyandmulti-pick-vocabularyparam values to be JSON-encoded arrays. This function ensures that encoding is applied after merging defaults with user params, so user-supplied plain strings don’t bypass the encoding.Params whose type is not in the form metadata, or whose type is not a vocabulary type, are returned unchanged.
- Return type:
- async veupath_chatbot.services.experiment.enrichment.run_enrichment_analysis(*, site_id, record_type, search_name, parameters, analysis_type)[source]¶
Run a single enrichment analysis on a search result set.
Creates a temporary WDK strategy, runs the analysis, parses results, and cleans up.
- Return type:
- async veupath_chatbot.services.experiment.enrichment.run_enrichment_on_step(*, site_id, step_id, analysis_type)[source]¶
Run enrichment on an already-persisted WDK step.
Used for multi-step experiments where the strategy already exists.
- Return type:
Custom gene set enrichment analysis against experiment results.
- class veupath_chatbot.services.experiment.custom_enrichment.CustomEnrichmentResult[source]¶
Bases:
TypedDictReturn shape of
run_custom_enrichment().
- veupath_chatbot.services.experiment.custom_enrichment.run_custom_enrichment(exp, gene_ids, gene_set_name)[source]¶
Test enrichment of a custom gene set against the experiment results.
Computes overlap, fold enrichment, p-value (hypergeometric), and odds ratio.
- Return type:
Cross-experiment enrichment comparison.
- class veupath_chatbot.services.experiment.enrichment_compare.EnrichmentRow[source]¶
Bases:
TypedDictShape of one term row in the enrichment comparison.
- class veupath_chatbot.services.experiment.enrichment_compare.EnrichmentCompareResult[source]¶
Bases:
TypedDictReturn shape of
compare_enrichment_across().- rows: list[EnrichmentRow]¶
- veupath_chatbot.services.experiment.enrichment_compare.compare_enrichment_across(experiments, experiment_ids, analysis_type=None)[source]¶
Compare enrichment results across experiments.
Builds a term-by-experiment matrix of fold-enrichment scores. Optionally filters to a single analysis type.
- Return type:
Gene set overlap analysis across experiments.
- class veupath_chatbot.services.experiment.overlap.PairwiseOverlap[source]¶
Bases:
TypedDictShape of one pairwise comparison entry.
- class veupath_chatbot.services.experiment.overlap.PerExperimentSummary[source]¶
Bases:
TypedDictShape of one per-experiment summary entry.
- class veupath_chatbot.services.experiment.overlap.GeneMembership[source]¶
Bases:
TypedDictShape of one gene membership entry.
- class veupath_chatbot.services.experiment.overlap.OverlapResult[source]¶
Bases:
TypedDictReturn shape of
compute_gene_set_overlap().- pairwise: list[PairwiseOverlap]¶
- perExperiment: list[PerExperimentSummary]¶
- geneMembership: list[GeneMembership]¶
- veupath_chatbot.services.experiment.overlap.compute_gene_set_overlap(experiments, experiment_ids)[source]¶
Compute pairwise gene set overlap between experiments.
For each experiment the result gene set is the union of TP and FP genes. Returns Jaccard similarity, shared/unique genes, and membership counts.
- Return type:
Bootstrap robustness and uncertainty estimation.
Resamples control sets with replacement and recomputes rank metrics to derive confidence intervals and stability scores — all pure Python, no additional WDK API calls required.
- veupath_chatbot.services.experiment.robustness.compute_robustness(result_ids, positive_ids, negative_ids, *, n_bootstrap=200, k_values=None, seed=42, alternative_negatives=None, include_rank_metrics=True)[source]¶
Compute bootstrap confidence intervals for classification (and optionally rank) metrics.
- Parameters:
result_ids (list[str]) – Ordered gene IDs from the strategy result.
n_bootstrap (int) – Number of bootstrap iterations.
k_values (list[int] | None) – K values for Precision/Recall/Enrichment@K.
seed (int) – Random seed for reproducibility.
alternative_negatives (dict[str, list[str]] | None) – Optional map of label -> negative IDs for negative-set sensitivity analysis.
include_rank_metrics (bool) – When
False, skip rank metric CIs and top-K stability — only classification CIs are computed.
- Returns:
Bootstrap robustness result.
- Return type:
Self-contained HTML report generation for experiments.
Generates a single-file HTML document with embedded styles, tables, and inline SVG charts. No external dependencies required.
- veupath_chatbot.services.experiment.report.generate_experiment_report(experiment)[source]¶
Generate a self-contained HTML report for an experiment.
- Parameters:
experiment (Experiment) – Full experiment object with results.
- Returns:
Complete HTML string.
- Return type:
Multi-step tree-knob optimization.
Tunes threshold parameters and boolean operators across a strategy tree using Optuna, optimizing for rank-based objectives (Precision@K, Enrichment@K) with optional list-size constraints.
- async veupath_chatbot.services.experiment.tree_knobs.optimize_tree_knobs(*, site_id, record_type, base_tree, threshold_knobs, operator_knobs, positive_controls, negative_controls, controls_search_name, controls_param_name, controls_value_format, objective='precision_at_50', budget=50, max_list_size=None)[source]¶
Run Optuna optimization over tree knobs.
- Parameters:
base_tree (JSONObject) –
PlanStepNode-shaped dict (the template tree).threshold_knobs (list[ThresholdKnob]) – Numeric parameter knobs on leaf steps.
operator_knobs (list[OperatorKnob]) – Boolean operator knobs on combine nodes.
objective (str) – Target metric name (e.g.
precision_at_50).budget (int) – Maximum number of Optuna trials.
max_list_size (int | None) – Optional upper bound on result list size.
- Returns:
Optimization result with best trial and history.
- Return type:
AI Analysis¶
AI-powered analysis helpers and tool definitions.
Helper functions for experiment analysis AI tools.
Utility functions for extracting WDK record data, classifying genes, searching records, and fetching result IDs.
- veupath_chatbot.services.experiment.ai_analysis_helpers.classify_gene(gene_id, tp_ids, fp_ids, fn_ids, tn_ids)[source]¶
Return the classification label for a gene ID.
- Parameters:
- Returns:
One of
"TP","FP","FN","TN", or None.- Return type:
str | None
- veupath_chatbot.services.experiment.ai_analysis_helpers.record_matches(attrs, query_lower, attribute)[source]¶
Check if a record’s attributes match a text query.
- Parameters:
attrs (JSONObject) – Record attribute dict.
query_lower (str) – Lowercased search query.
attribute (str | None) – Specific attribute to search in, or None for all.
- Returns:
True if any matching attribute value is found.
- Return type:
- async veupath_chatbot.services.experiment.ai_analysis_helpers.build_primary_key(api, site_id, record_type, gene_id)[source]¶
Build a complete WDK primary key for a gene ID.
WDK requires all primary key columns (e.g.
source_id+project_idfor gene records). This helper fetches the record type info and fills missing columns from site configuration.- Parameters:
api (StrategyAPI) – Strategy API instance.
site_id (str) – VEuPathDB site identifier.
record_type (str) – WDK record type.
gene_id (str) – Gene identifier (the
source_idvalue).
- Returns:
List of
{name, value}dicts forming the complete PK.- Return type:
- async veupath_chatbot.services.experiment.ai_analysis_helpers.fetch_group_records(api, record_type, gene_ids, limit=20, site_id=None)[source]¶
Fetch records for a list of gene IDs.
- Parameters:
- Returns:
List of dicts with
geneIdandattributes.- Return type:
- async veupath_chatbot.services.experiment.ai_analysis_helpers.collect_all_result_ids(api, step_id)[source]¶
Fetch all result gene IDs from a WDK step by paginating.
- Parameters:
api (StrategyAPI) – Strategy API instance.
step_id (int) – WDK step ID.
- Returns:
Set of all gene IDs in the step’s results.
- Return type:
AI tools for deep experiment result analysis.
Provides function-calling tools that let the AI assistant access experiment data: paginate through records, look up individual genes, get attribute distributions, compare gene groups, and search results.
The agent class is built dynamically via build_analysis_agent_class()
so that the services layer never needs a static import from
veupath_chatbot.ai. The configured experiment-agent base class
is injected at startup.
- veupath_chatbot.services.experiment.ai_analysis_tools.configure(*, experiment_agent_cls)[source]¶
Wire the experiment agent base class.
Called once at application startup from the composition root.
- class veupath_chatbot.services.experiment.ai_analysis_tools.ExperimentAnalysisAgent(engine, site_id, experiment_id, system_prompt, chat_history=None)[source]¶
Bases:
RefinementToolsMixin,_AnalysisToolsMixin,KaniAI agent with data-access and strategy-refinement tools.
Combines analysis tools (data browsing, gene lookup, distributions) with refinement tools (add steps, filter, re-evaluate) and the experiment assistant’s catalog/research tools (inherited via the injected base class).
The base class is set dynamically at startup; if not configured, instantiation falls back to plain Kani.
Experiment wizard AI assistant — prompt construction and orchestration.
Builds step-specific system prompts, creates a lightweight experiment assistant agent, and streams its response.
AI-layer dependencies (engine factory, agent classes) are injected at
startup via configure() so that the services layer never imports
from veupath_chatbot.ai.
- veupath_chatbot.services.experiment.assistant.configure(*, create_engine_fn, experiment_agent_cls)[source]¶
Wire AI-layer implementations into the experiment assistant.
Called once at application startup from the composition root.
- veupath_chatbot.services.experiment.assistant.build_system_prompt(step, site_id, context)[source]¶
Build the step-specific system prompt with injected context.
- Parameters:
step (Literal['search', 'parameters', 'controls', 'run', 'results', 'analysis']) – Current wizard step.
site_id (str) – VEuPathDB site identifier.
context (JSONObject) – Wizard state (search, params, controls, etc.).
- Returns:
Formatted system prompt string.
- Return type:
- async veupath_chatbot.services.experiment.assistant.run_assistant(site_id, step, message, context, history=None, model_override=None, provider_override=None, reasoning_effort=None)[source]¶
Create an experiment assistant and stream its response.
- Parameters:
site_id (str) – VEuPathDB site identifier.
step (Literal['search', 'parameters', 'controls', 'run', 'results', 'analysis']) – Current wizard step.
message (str) – User message.
context (JSONObject) – Wizard state context.
history (list[JSONObject] | None) – Previous conversation messages.
model_override (str | None) – Model catalog ID override (default:
openai/gpt-4.1-nano).provider_override (Literal['openai', 'anthropic', 'google', 'ollama', 'mock'] | None) – Provider override.
reasoning_effort (Literal['none', 'low', 'medium', 'high'] | None) – Reasoning effort override.
- Returns:
Async iterator of SSE-compatible event dicts.
- Return type:
AI Refinement Tools¶
Purpose: AI tools for experiment strategy refinement. Function-calling
tools decorated with @ai_function that allow the workbench agent to
add search steps, combine results with gene lists, and trigger re-evaluation.
AI tools for experiment strategy refinement.
Provides function-calling tools that let the AI assistant refine the experiment strategy: add new search steps, combine with gene ID lists, and re-evaluate control metrics after refinement.
- class veupath_chatbot.services.experiment.ai_refinement_tools.RefinementToolsMixin[source]¶
Bases:
objectMixin providing strategy-refinement @ai_function methods.
Classes using this mixin must provide: - site_id: str - _get_experiment() -> Experiment | None (async)
- async refine_with_search(search_name, parameters, operator='INTERSECT')[source]¶
Add a new search step and combine it with current experiment results.
Creates a WDK search step, then combines it with the experiment’s current results using the specified boolean operator. The experiment strategy is updated so subsequent queries reflect the refined results. Call re_evaluate_controls afterwards to see the impact on metrics.
- Return type:
- async refine_with_gene_ids(gene_ids, operator='INTERSECT')[source]¶
Combine experiment results with a gene ID list.
Creates a gene ID search step using the experiment’s controls search configuration, then combines it with the current results. Use INTERSECT to filter results to only these genes, UNION to add them, or MINUS to exclude them. Call re_evaluate_controls afterwards to see the impact on metrics.
- Return type:
- async re_evaluate_controls()[source]¶
Re-run control evaluation against the current (possibly refined) strategy.
Computes updated classification metrics by checking which positive and negative control genes appear in the current result set. Use this after refining the strategy to see the impact on performance.
- Return type:
Step Analysis¶
Multi-step strategy analysis: per-step evaluation, operator comparison, contribution analysis, and parameter sensitivity.
Step decomposition analysis for multi-step strategies.
Replaces the Optuna-based tree optimization with four interpretable analysis phases that give researchers actionable, per-step insights:
Per-step evaluation – evaluate each leaf independently.
Operator comparison – try all operators at each combine node.
Step contribution (ablation) – measure the impact of removing each leaf.
Parameter sensitivity – sweep numeric params across their range.
- async veupath_chatbot.services.experiment.step_analysis.run_controls_against_tree(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls=None, negative_controls=None)[source]¶
Materialise a
PlanStepNodetree, intersect with controls, return metrics.Creates a temporary WDK strategy containing the full tree, adds an intersection step with each control set on top of the root, queries the result counts, then deletes everything.
Returns the same shape as
run_positive_negative_controls()sometrics_from_control_result()can consume it directly.- Return type:
- async veupath_chatbot.services.experiment.step_analysis.run_step_analysis(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls, negative_controls, baseline_result, phases=None, progress_callback=None)[source]¶
Run all requested step analysis phases.
- Parameters:
tree (JSONObject) –
PlanStepNode-shaped dict.baseline_result (JSONObject) – Raw result from the initial tree evaluation.
phases (list[str] | None) – Which phases to run. Defaults to all four.
- Returns:
Aggregated
StepAnalysisResult.- Return type:
Main entry point: run_step_analysis coordinates all four analysis phases.
- async veupath_chatbot.services.experiment.step_analysis.orchestrator.run_step_analysis(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls, negative_controls, baseline_result, phases=None, progress_callback=None)[source]¶
Run all requested step analysis phases.
- Parameters:
tree (JSONObject) –
PlanStepNode-shaped dict.baseline_result (JSONObject) – Raw result from the initial tree evaluation.
phases (list[str] | None) – Which phases to run. Defaults to all four.
- Returns:
Aggregated
StepAnalysisResult.- Return type:
Phase 1: Per-step evaluation – evaluate each leaf independently.
- async veupath_chatbot.services.experiment.step_analysis.phase_step_eval.evaluate_steps(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls, negative_controls, progress_callback=None)[source]¶
Evaluate each leaf step against controls, preserving ancestor transforms.
For each leaf, the evaluation includes any transform chain above it (e.g.
GenesByOrthologs) so that cross-organism searches are converted before being compared against controls.- Parameters:
tree (JSONObject) –
PlanStepNode-shaped dict.- Returns:
One
StepEvaluationper leaf.- Return type:
Phase 2: Operator comparison – try all operators at each combine node.
- async veupath_chatbot.services.experiment.step_analysis.phase_operators.compare_operators(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls, negative_controls, progress_callback=None)[source]¶
For each combine node, evaluate INTERSECT, UNION, MINUS and recommend.
- Parameters:
tree (JSONObject) –
PlanStepNode-shaped dict.- Returns:
One
OperatorComparisonper combine node.- Return type:
Phase 3: Step contribution (ablation) – measure impact of removing each leaf.
- async veupath_chatbot.services.experiment.step_analysis.phase_contribution.analyze_contributions(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls, negative_controls, baseline_metrics, progress_callback=None)[source]¶
Ablation analysis: remove each leaf and measure the impact.
- Parameters:
baseline_metrics (JSONObject) – Metrics from the full tree evaluation.
- Returns:
One
StepContributionper leaf.- Return type:
Phase 4: Parameter sensitivity – sweep numeric params across their range.
- async veupath_chatbot.services.experiment.step_analysis.phase_sensitivity.sweep_parameters(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls, negative_controls, progress_callback=None)[source]¶
Sweep numeric params on each leaf across their WDK-declared range.
Respects paired min/max bound parameters, deduplicates identical searches across leaves, and only recommends changes when the improvement is meaningful.
- Parameters:
tree (JSONObject) –
PlanStepNode-shaped dict.- Returns:
One
ParameterSensitivityper numeric param.- Return type:
Control evaluation logic: run trees/steps against control sets and extract metrics.
- async veupath_chatbot.services.experiment.step_analysis._evaluation.run_controls_against_tree(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls=None, negative_controls=None)[source]¶
Materialise a
PlanStepNodetree, intersect with controls, return metrics.Creates a temporary WDK strategy containing the full tree, adds an intersection step with each control set on top of the root, queries the result counts, then deletes everything.
Returns the same shape as
run_positive_negative_controls()sometrics_from_control_result()can consume it directly.- Return type:
Tree traversal and manipulation helpers for step analysis.
Types¶
Pydantic models for experiment configuration, metrics, enrichment, and results.
Shared data types for the Experiment Lab.
This package consolidates all experiment-related dataclasses, type aliases, and serialization helpers. All public symbols are re-exported here.
- class veupath_chatbot.services.experiment.types.ConfusionMatrix(true_positives, false_positives, true_negatives, false_negatives)[source]¶
Bases:
object2x2 confusion matrix counts.
- __init__(true_positives, false_positives, true_negatives, false_negatives)¶
- class veupath_chatbot.services.experiment.types.CrossValidationResult(k, folds, mean_metrics, std_metrics=<factory>, overfitting_score=0.0, overfitting_level='low')[source]¶
Bases:
objectAggregated cross-validation result.
- folds: list[FoldMetrics]¶
- mean_metrics: ExperimentMetrics¶
- __init__(k, folds, mean_metrics, std_metrics=<factory>, overfitting_score=0.0, overfitting_level='low')¶
- class veupath_chatbot.services.experiment.types.ExperimentMetrics(confusion_matrix, sensitivity, specificity, precision, f1_score, mcc, balanced_accuracy, negative_predictive_value=0.0, false_positive_rate=0.0, false_negative_rate=0.0, youdens_j=0.0, total_results=0, total_positives=0, total_negatives=0)[source]¶
Bases:
objectFull classification metrics derived from a confusion matrix.
- confusion_matrix: ConfusionMatrix¶
- __init__(confusion_matrix, sensitivity, specificity, precision, f1_score, mcc, balanced_accuracy, negative_predictive_value=0.0, false_positive_rate=0.0, false_negative_rate=0.0, youdens_j=0.0, total_results=0, total_positives=0, total_negatives=0)¶
- class veupath_chatbot.services.experiment.types.FoldMetrics(fold_index, metrics, positive_control_ids=<factory>, negative_control_ids=<factory>)[source]¶
Bases:
objectMetrics for a single cross-validation fold.
- metrics: ExperimentMetrics¶
- __init__(fold_index, metrics, positive_control_ids=<factory>, negative_control_ids=<factory>)¶
- class veupath_chatbot.services.experiment.types.GeneInfo(id, name=None, organism=None, product=None)[source]¶
Bases:
objectMinimal gene metadata.
- __init__(id, name=None, organism=None, product=None)¶
- class veupath_chatbot.services.experiment.types.EnrichmentResult(analysis_type, terms, total_genes_analyzed=0, background_size=0, error=None)[source]¶
Bases:
objectResults for a single enrichment analysis type.
- terms: list[EnrichmentTerm]¶
- __init__(analysis_type, terms, total_genes_analyzed=0, background_size=0, error=None)¶
- class veupath_chatbot.services.experiment.types.EnrichmentTerm(term_id, term_name, gene_count, background_count, fold_enrichment, odds_ratio, p_value, fdr, bonferroni, genes=<factory>)[source]¶
Bases:
objectSingle enriched term from WDK analysis.
- __init__(term_id, term_name, gene_count, background_count, fold_enrichment, odds_ratio, p_value, fdr, bonferroni, genes=<factory>)¶
- class veupath_chatbot.services.experiment.types.BootstrapResult(n_iterations=0, metric_cis=<factory>, rank_metric_cis=<factory>, top_k_stability=0.0, negative_set_sensitivity=<factory>)[source]¶
Bases:
objectRobustness assessment via bootstrap resampling.
- metric_cis: dict[str, ConfidenceInterval]¶
- rank_metric_cis: dict[str, ConfidenceInterval]¶
- negative_set_sensitivity: list[NegativeSetVariant]¶
- __init__(n_iterations=0, metric_cis=<factory>, rank_metric_cis=<factory>, top_k_stability=0.0, negative_set_sensitivity=<factory>)¶
- class veupath_chatbot.services.experiment.types.ConfidenceInterval(lower=0.0, mean=0.0, upper=0.0, std=0.0)[source]¶
Bases:
objectBootstrap confidence interval for a single metric.
- __init__(lower=0.0, mean=0.0, upper=0.0, std=0.0)¶
- class veupath_chatbot.services.experiment.types.NegativeSetVariant(label, rank_metrics, negative_count=0)[source]¶
Bases:
objectRank metrics evaluated with an alternative negative control set.
- rank_metrics: RankMetrics¶
- __init__(label, rank_metrics, negative_count=0)¶
- class veupath_chatbot.services.experiment.types.RankMetrics(precision_at_k=<factory>, recall_at_k=<factory>, enrichment_at_k=<factory>, pr_curve=<factory>, list_size_vs_recall=<factory>, total_results=0)[source]¶
Bases:
objectRank-based evaluation metrics computed over an ordered result list.
- __init__(precision_at_k=<factory>, recall_at_k=<factory>, enrichment_at_k=<factory>, pr_curve=<factory>, list_size_vs_recall=<factory>, total_results=0)¶
- class veupath_chatbot.services.experiment.types.OperatorKnob(combine_node_id, options=<factory>)[source]¶
Bases:
objectA combine-node operator that can be switched during optimization.
- __init__(combine_node_id, options=<factory>)¶
- class veupath_chatbot.services.experiment.types.OptimizationSpec(name, type, min=None, max=None, step=None, choices=None)[source]¶
Bases:
objectDescribes a single parameter to optimise.
- __init__(name, type, min=None, max=None, step=None, choices=None)¶
- class veupath_chatbot.services.experiment.types.ThresholdKnob(step_id, param_name, min_val, max_val, step_size=None)[source]¶
Bases:
objectA numeric parameter on a leaf step that can be tuned.
- __init__(step_id, param_name, min_val, max_val, step_size=None)¶
- class veupath_chatbot.services.experiment.types.TreeOptimizationResult(best_trial=None, all_trials=<factory>, total_time_seconds=0.0, objective='')[source]¶
Bases:
objectResult of multi-step tree-knob optimization.
- best_trial: TreeOptimizationTrial | None¶
- all_trials: list[TreeOptimizationTrial]¶
- __init__(best_trial=None, all_trials=<factory>, total_time_seconds=0.0, objective='')¶
- class veupath_chatbot.services.experiment.types.TreeOptimizationTrial(trial_number, parameters=<factory>, score=0.0, rank_metrics=None, list_size=0)[source]¶
Bases:
objectOne trial during tree-knob optimization.
- rank_metrics: RankMetrics | None¶
- __init__(trial_number, parameters=<factory>, score=0.0, rank_metrics=None, list_size=0)¶
- class veupath_chatbot.services.experiment.types.OperatorComparison(combine_node_id, current_operator, variants=<factory>, recommendation='', recommended_operator='', precision_at_k_delta=<factory>)[source]¶
Bases:
objectComparison of operators at a single combine node.
- variants: list[OperatorVariant]¶
- __init__(combine_node_id, current_operator, variants=<factory>, recommendation='', recommended_operator='', precision_at_k_delta=<factory>)¶
- class veupath_chatbot.services.experiment.types.OperatorVariant(operator, positive_hits, negative_hits, total_results, recall, false_positive_rate, f1_score)[source]¶
Bases:
objectMetrics for one boolean operator at a combine node.
- __init__(operator, positive_hits, negative_hits, total_results, recall, false_positive_rate, f1_score)¶
- class veupath_chatbot.services.experiment.types.ParameterSensitivity(step_id, param_name, current_value, sweep_points=<factory>, recommended_value=0.0, recommendation='')[source]¶
Bases:
objectSensitivity sweep for one numeric parameter on one leaf step.
- sweep_points: list[ParameterSweepPoint]¶
- __init__(step_id, param_name, current_value, sweep_points=<factory>, recommended_value=0.0, recommendation='')¶
- class veupath_chatbot.services.experiment.types.ParameterSweepPoint(value, positive_hits, negative_hits, total_results, recall, fpr, f1)[source]¶
Bases:
objectOne data point in a parameter sensitivity sweep.
- __init__(value, positive_hits, negative_hits, total_results, recall, fpr, f1)¶
- class veupath_chatbot.services.experiment.types.StepAnalysisResult(step_evaluations=<factory>, operator_comparisons=<factory>, step_contributions=<factory>, parameter_sensitivities=<factory>)[source]¶
Bases:
objectContainer for all deterministic step analysis results.
- step_evaluations: list[StepEvaluation]¶
- operator_comparisons: list[OperatorComparison]¶
- step_contributions: list[StepContribution]¶
- parameter_sensitivities: list[ParameterSensitivity]¶
- __init__(step_evaluations=<factory>, operator_comparisons=<factory>, step_contributions=<factory>, parameter_sensitivities=<factory>)¶
- class veupath_chatbot.services.experiment.types.StepContribution(step_id, search_name, baseline_recall, ablated_recall, recall_delta, baseline_fpr, ablated_fpr, fpr_delta, verdict, enrichment_delta=0.0, narrative='')[source]¶
Bases:
objectAblation analysis for one leaf step.
- __init__(step_id, search_name, baseline_recall, ablated_recall, recall_delta, baseline_fpr, ablated_fpr, fpr_delta, verdict, enrichment_delta=0.0, narrative='')¶
- class veupath_chatbot.services.experiment.types.StepEvaluation(step_id, search_name, display_name, result_count, positive_hits, positive_total, negative_hits, negative_total, recall, false_positive_rate, captured_positive_ids=<factory>, captured_negative_ids=<factory>, tp_movement=0, fp_movement=0, fn_movement=0)[source]¶
Bases:
objectPer-leaf-step evaluation against controls.
- __init__(step_id, search_name, display_name, result_count, positive_hits, positive_total, negative_hits, negative_total, recall, false_positive_rate, captured_positive_ids=<factory>, captured_negative_ids=<factory>, tp_movement=0, fp_movement=0, fn_movement=0)¶
- class veupath_chatbot.services.experiment.types.BatchExperimentConfig(base_config, organism_param_name, target_organisms=<factory>)[source]¶
Bases:
objectConfiguration for running the same search across multiple organisms.
- base_config: ExperimentConfig¶
- target_organisms: list[BatchOrganismTarget]¶
- __init__(base_config, organism_param_name, target_organisms=<factory>)¶
- class veupath_chatbot.services.experiment.types.BatchOrganismTarget(organism, positive_controls=None, negative_controls=None)[source]¶
Bases:
objectPer-organism overrides for a cross-organism batch experiment.
- __init__(organism, positive_controls=None, negative_controls=None)¶
- class veupath_chatbot.services.experiment.types.Experiment(id, config, user_id=None, status='pending', metrics=None, cross_validation=None, enrichment_results=<factory>, true_positive_genes=<factory>, false_negative_genes=<factory>, false_positive_genes=<factory>, true_negative_genes=<factory>, error=None, total_time_seconds=None, created_at='', completed_at=None, batch_id=None, benchmark_id=None, control_set_label=None, is_primary_benchmark=False, optimization_result=None, wdk_strategy_id=None, wdk_step_id=None, notes=None, step_analysis=None, rank_metrics=None, robustness=None, tree_optimization=None)[source]¶
Bases:
objectFull experiment with config and results.
- config: ExperimentConfig¶
- metrics: ExperimentMetrics | None¶
- cross_validation: CrossValidationResult | None¶
- enrichment_results: list[EnrichmentResult]¶
- optimization_result: JSONObject | None¶
- step_analysis: StepAnalysisResult | None¶
- rank_metrics: RankMetrics | None¶
- robustness: BootstrapResult | None¶
- tree_optimization: TreeOptimizationResult | None¶
- __init__(id, config, user_id=None, status='pending', metrics=None, cross_validation=None, enrichment_results=<factory>, true_positive_genes=<factory>, false_negative_genes=<factory>, false_positive_genes=<factory>, true_negative_genes=<factory>, error=None, total_time_seconds=None, created_at='', completed_at=None, batch_id=None, benchmark_id=None, control_set_label=None, is_primary_benchmark=False, optimization_result=None, wdk_strategy_id=None, wdk_step_id=None, notes=None, step_analysis=None, rank_metrics=None, robustness=None, tree_optimization=None)¶
- class veupath_chatbot.services.experiment.types.ExperimentConfig(site_id, record_type, search_name, parameters, positive_controls, negative_controls, controls_search_name, controls_param_name, controls_value_format='newline', enable_cross_validation=False, k_folds=5, enrichment_types=<factory>, name='', description='', optimization_specs=None, optimization_budget=30, optimization_objective='balanced_accuracy', parameter_display_values=None, mode='single', step_tree=None, source_strategy_id=None, optimization_target_step=None, enable_step_analysis=False, step_analysis_phases=<factory>, control_set_id=None, threshold_knobs=None, operator_knobs=None, tree_optimization_objective='precision_at_50', tree_optimization_budget=50, max_list_size=None, sort_attribute=None, sort_direction='ASC', parent_experiment_id=None, target_gene_ids=None)[source]¶
Bases:
objectFull configuration for an experiment run.
Supports three modes:
single (default): one search + parameters.
multi-step: a recursive
step_treeof search/combine/transform nodes.import: import an existing Pathfinder strategy by
source_strategy_id.
- parameters: JSONObject¶
- optimization_specs: list[OptimizationSpec] | None¶
- optimization_objective: Literal['f1', 'f_beta', 'recall', 'precision', 'specificity', 'balanced_accuracy', 'mcc', 'youdens_j', 'custom']¶
- threshold_knobs: list[ThresholdKnob] | None¶
- operator_knobs: list[OperatorKnob] | None¶
- __init__(site_id, record_type, search_name, parameters, positive_controls, negative_controls, controls_search_name, controls_param_name, controls_value_format='newline', enable_cross_validation=False, k_folds=5, enrichment_types=<factory>, name='', description='', optimization_specs=None, optimization_budget=30, optimization_objective='balanced_accuracy', parameter_display_values=None, mode='single', step_tree=None, source_strategy_id=None, optimization_target_step=None, enable_step_analysis=False, step_analysis_phases=<factory>, control_set_id=None, threshold_knobs=None, operator_knobs=None, tree_optimization_objective='precision_at_50', tree_optimization_budget=50, max_list_size=None, sort_attribute=None, sort_direction='ASC', parent_experiment_id=None, target_gene_ids=None)¶
- veupath_chatbot.services.experiment.types.experiment_summary_to_json(exp)[source]¶
Serialize an experiment to a lightweight summary dict.
- Return type:
- veupath_chatbot.services.experiment.types.experiment_to_json(exp)[source]¶
Serialize a full
Experimentto a JSON-compatible dict.- Return type:
- veupath_chatbot.services.experiment.types.from_json(data, cls)[source]¶
Construct a cls dataclass from a camelCase JSON dict.
Nested dataclasses, lists, dicts, and tuples are coerced using type-hint introspection. Missing keys fall back to field defaults.
- Return type:
T
- veupath_chatbot.services.experiment.types.to_json(obj, *, _round=4)[source]¶
Serialize a dataclass (or scalar) to a JSON-compatible value.
Dataclass fields are emitted with camelCase keys.
Floats are rounded to _round decimal places (default 4). Override per-field via
field(metadata={"round": N}).Lists, tuples, and dicts are handled recursively.
- Return type:
Experiment and ExperimentConfig dataclasses.
- class veupath_chatbot.services.experiment.types.experiment.ExperimentConfig(site_id, record_type, search_name, parameters, positive_controls, negative_controls, controls_search_name, controls_param_name, controls_value_format='newline', enable_cross_validation=False, k_folds=5, enrichment_types=<factory>, name='', description='', optimization_specs=None, optimization_budget=30, optimization_objective='balanced_accuracy', parameter_display_values=None, mode='single', step_tree=None, source_strategy_id=None, optimization_target_step=None, enable_step_analysis=False, step_analysis_phases=<factory>, control_set_id=None, threshold_knobs=None, operator_knobs=None, tree_optimization_objective='precision_at_50', tree_optimization_budget=50, max_list_size=None, sort_attribute=None, sort_direction='ASC', parent_experiment_id=None, target_gene_ids=None)[source]¶
Bases:
objectFull configuration for an experiment run.
Supports three modes:
single (default): one search + parameters.
multi-step: a recursive
step_treeof search/combine/transform nodes.import: import an existing Pathfinder strategy by
source_strategy_id.
- parameters: JSONObject¶
- optimization_specs: list[OptimizationSpec] | None¶
- optimization_objective: Literal['f1', 'f_beta', 'recall', 'precision', 'specificity', 'balanced_accuracy', 'mcc', 'youdens_j', 'custom']¶
- threshold_knobs: list[ThresholdKnob] | None¶
- operator_knobs: list[OperatorKnob] | None¶
- __init__(site_id, record_type, search_name, parameters, positive_controls, negative_controls, controls_search_name, controls_param_name, controls_value_format='newline', enable_cross_validation=False, k_folds=5, enrichment_types=<factory>, name='', description='', optimization_specs=None, optimization_budget=30, optimization_objective='balanced_accuracy', parameter_display_values=None, mode='single', step_tree=None, source_strategy_id=None, optimization_target_step=None, enable_step_analysis=False, step_analysis_phases=<factory>, control_set_id=None, threshold_knobs=None, operator_knobs=None, tree_optimization_objective='precision_at_50', tree_optimization_budget=50, max_list_size=None, sort_attribute=None, sort_direction='ASC', parent_experiment_id=None, target_gene_ids=None)¶
- class veupath_chatbot.services.experiment.types.experiment.BatchOrganismTarget(organism, positive_controls=None, negative_controls=None)[source]¶
Bases:
objectPer-organism overrides for a cross-organism batch experiment.
- __init__(organism, positive_controls=None, negative_controls=None)¶
- class veupath_chatbot.services.experiment.types.experiment.BatchExperimentConfig(base_config, organism_param_name, target_organisms=<factory>)[source]¶
Bases:
objectConfiguration for running the same search across multiple organisms.
- base_config: ExperimentConfig¶
- target_organisms: list[BatchOrganismTarget]¶
- __init__(base_config, organism_param_name, target_organisms=<factory>)¶
- class veupath_chatbot.services.experiment.types.experiment.Experiment(id, config, user_id=None, status='pending', metrics=None, cross_validation=None, enrichment_results=<factory>, true_positive_genes=<factory>, false_negative_genes=<factory>, false_positive_genes=<factory>, true_negative_genes=<factory>, error=None, total_time_seconds=None, created_at='', completed_at=None, batch_id=None, benchmark_id=None, control_set_label=None, is_primary_benchmark=False, optimization_result=None, wdk_strategy_id=None, wdk_step_id=None, notes=None, step_analysis=None, rank_metrics=None, robustness=None, tree_optimization=None)[source]¶
Bases:
objectFull experiment with config and results.
- config: ExperimentConfig¶
- metrics: ExperimentMetrics | None¶
- cross_validation: CrossValidationResult | None¶
- enrichment_results: list[EnrichmentResult]¶
- optimization_result: JSONObject | None¶
- step_analysis: StepAnalysisResult | None¶
- rank_metrics: RankMetrics | None¶
- robustness: BootstrapResult | None¶
- tree_optimization: TreeOptimizationResult | None¶
- __init__(id, config, user_id=None, status='pending', metrics=None, cross_validation=None, enrichment_results=<factory>, true_positive_genes=<factory>, false_negative_genes=<factory>, false_positive_genes=<factory>, true_negative_genes=<factory>, error=None, total_time_seconds=None, created_at='', completed_at=None, batch_id=None, benchmark_id=None, control_set_label=None, is_primary_benchmark=False, optimization_result=None, wdk_strategy_id=None, wdk_step_id=None, notes=None, step_analysis=None, rank_metrics=None, robustness=None, tree_optimization=None)¶
Core type aliases and Literal types for the Experiment Lab.
Classification metrics dataclasses for the Experiment Lab.
- class veupath_chatbot.services.experiment.types.metrics.ConfusionMatrix(true_positives, false_positives, true_negatives, false_negatives)[source]¶
Bases:
object2x2 confusion matrix counts.
- __init__(true_positives, false_positives, true_negatives, false_negatives)¶
- class veupath_chatbot.services.experiment.types.metrics.ExperimentMetrics(confusion_matrix, sensitivity, specificity, precision, f1_score, mcc, balanced_accuracy, negative_predictive_value=0.0, false_positive_rate=0.0, false_negative_rate=0.0, youdens_j=0.0, total_results=0, total_positives=0, total_negatives=0)[source]¶
Bases:
objectFull classification metrics derived from a confusion matrix.
- confusion_matrix: ConfusionMatrix¶
- __init__(confusion_matrix, sensitivity, specificity, precision, f1_score, mcc, balanced_accuracy, negative_predictive_value=0.0, false_positive_rate=0.0, false_negative_rate=0.0, youdens_j=0.0, total_results=0, total_positives=0, total_negatives=0)¶
- class veupath_chatbot.services.experiment.types.metrics.GeneInfo(id, name=None, organism=None, product=None)[source]¶
Bases:
objectMinimal gene metadata.
- __init__(id, name=None, organism=None, product=None)¶
- class veupath_chatbot.services.experiment.types.metrics.FoldMetrics(fold_index, metrics, positive_control_ids=<factory>, negative_control_ids=<factory>)[source]¶
Bases:
objectMetrics for a single cross-validation fold.
- metrics: ExperimentMetrics¶
- __init__(fold_index, metrics, positive_control_ids=<factory>, negative_control_ids=<factory>)¶
- class veupath_chatbot.services.experiment.types.metrics.CrossValidationResult(k, folds, mean_metrics, std_metrics=<factory>, overfitting_score=0.0, overfitting_level='low')[source]¶
Bases:
objectAggregated cross-validation result.
- folds: list[FoldMetrics]¶
- mean_metrics: ExperimentMetrics¶
- __init__(k, folds, mean_metrics, std_metrics=<factory>, overfitting_score=0.0, overfitting_level='low')¶
Enrichment analysis dataclasses for the Experiment Lab.
- class veupath_chatbot.services.experiment.types.enrichment.EnrichmentTerm(term_id, term_name, gene_count, background_count, fold_enrichment, odds_ratio, p_value, fdr, bonferroni, genes=<factory>)[source]¶
Bases:
objectSingle enriched term from WDK analysis.
- __init__(term_id, term_name, gene_count, background_count, fold_enrichment, odds_ratio, p_value, fdr, bonferroni, genes=<factory>)¶
- class veupath_chatbot.services.experiment.types.enrichment.EnrichmentResult(analysis_type, terms, total_genes_analyzed=0, background_size=0, error=None)[source]¶
Bases:
objectResults for a single enrichment analysis type.
- terms: list[EnrichmentTerm]¶
- __init__(analysis_type, terms, total_genes_analyzed=0, background_size=0, error=None)¶
Optimization-related dataclasses for the Experiment Lab.
- class veupath_chatbot.services.experiment.types.optimization.OptimizationSpec(name, type, min=None, max=None, step=None, choices=None)[source]¶
Bases:
objectDescribes a single parameter to optimise.
- __init__(name, type, min=None, max=None, step=None, choices=None)¶
- class veupath_chatbot.services.experiment.types.optimization.ThresholdKnob(step_id, param_name, min_val, max_val, step_size=None)[source]¶
Bases:
objectA numeric parameter on a leaf step that can be tuned.
- __init__(step_id, param_name, min_val, max_val, step_size=None)¶
- class veupath_chatbot.services.experiment.types.optimization.OperatorKnob(combine_node_id, options=<factory>)[source]¶
Bases:
objectA combine-node operator that can be switched during optimization.
- __init__(combine_node_id, options=<factory>)¶
- class veupath_chatbot.services.experiment.types.optimization.TreeOptimizationTrial(trial_number, parameters=<factory>, score=0.0, rank_metrics=None, list_size=0)[source]¶
Bases:
objectOne trial during tree-knob optimization.
- rank_metrics: RankMetrics | None¶
- __init__(trial_number, parameters=<factory>, score=0.0, rank_metrics=None, list_size=0)¶
- class veupath_chatbot.services.experiment.types.optimization.TreeOptimizationResult(best_trial=None, all_trials=<factory>, total_time_seconds=0.0, objective='')[source]¶
Bases:
objectResult of multi-step tree-knob optimization.
- best_trial: TreeOptimizationTrial | None¶
- all_trials: list[TreeOptimizationTrial]¶
- __init__(best_trial=None, all_trials=<factory>, total_time_seconds=0.0, objective='')¶
Rank-based evaluation dataclasses for the Experiment Lab.
- class veupath_chatbot.services.experiment.types.rank.RankMetrics(precision_at_k=<factory>, recall_at_k=<factory>, enrichment_at_k=<factory>, pr_curve=<factory>, list_size_vs_recall=<factory>, total_results=0)[source]¶
Bases:
objectRank-based evaluation metrics computed over an ordered result list.
- __init__(precision_at_k=<factory>, recall_at_k=<factory>, enrichment_at_k=<factory>, pr_curve=<factory>, list_size_vs_recall=<factory>, total_results=0)¶
- class veupath_chatbot.services.experiment.types.rank.ConfidenceInterval(lower=0.0, mean=0.0, upper=0.0, std=0.0)[source]¶
Bases:
objectBootstrap confidence interval for a single metric.
- __init__(lower=0.0, mean=0.0, upper=0.0, std=0.0)¶
- class veupath_chatbot.services.experiment.types.rank.NegativeSetVariant(label, rank_metrics, negative_count=0)[source]¶
Bases:
objectRank metrics evaluated with an alternative negative control set.
- rank_metrics: RankMetrics¶
- __init__(label, rank_metrics, negative_count=0)¶
- class veupath_chatbot.services.experiment.types.rank.BootstrapResult(n_iterations=0, metric_cis=<factory>, rank_metric_cis=<factory>, top_k_stability=0.0, negative_set_sensitivity=<factory>)[source]¶
Bases:
objectRobustness assessment via bootstrap resampling.
- metric_cis: dict[str, ConfidenceInterval]¶
- rank_metric_cis: dict[str, ConfidenceInterval]¶
- negative_set_sensitivity: list[NegativeSetVariant]¶
- __init__(n_iterations=0, metric_cis=<factory>, rank_metric_cis=<factory>, top_k_stability=0.0, negative_set_sensitivity=<factory>)¶
Step analysis dataclasses for multi-step experiment decomposition.
- class veupath_chatbot.services.experiment.types.step_analysis.StepEvaluation(step_id, search_name, display_name, result_count, positive_hits, positive_total, negative_hits, negative_total, recall, false_positive_rate, captured_positive_ids=<factory>, captured_negative_ids=<factory>, tp_movement=0, fp_movement=0, fn_movement=0)[source]¶
Bases:
objectPer-leaf-step evaluation against controls.
- __init__(step_id, search_name, display_name, result_count, positive_hits, positive_total, negative_hits, negative_total, recall, false_positive_rate, captured_positive_ids=<factory>, captured_negative_ids=<factory>, tp_movement=0, fp_movement=0, fn_movement=0)¶
- class veupath_chatbot.services.experiment.types.step_analysis.OperatorVariant(operator, positive_hits, negative_hits, total_results, recall, false_positive_rate, f1_score)[source]¶
Bases:
objectMetrics for one boolean operator at a combine node.
- __init__(operator, positive_hits, negative_hits, total_results, recall, false_positive_rate, f1_score)¶
- class veupath_chatbot.services.experiment.types.step_analysis.OperatorComparison(combine_node_id, current_operator, variants=<factory>, recommendation='', recommended_operator='', precision_at_k_delta=<factory>)[source]¶
Bases:
objectComparison of operators at a single combine node.
- variants: list[OperatorVariant]¶
- __init__(combine_node_id, current_operator, variants=<factory>, recommendation='', recommended_operator='', precision_at_k_delta=<factory>)¶
- class veupath_chatbot.services.experiment.types.step_analysis.StepContribution(step_id, search_name, baseline_recall, ablated_recall, recall_delta, baseline_fpr, ablated_fpr, fpr_delta, verdict, enrichment_delta=0.0, narrative='')[source]¶
Bases:
objectAblation analysis for one leaf step.
- __init__(step_id, search_name, baseline_recall, ablated_recall, recall_delta, baseline_fpr, ablated_fpr, fpr_delta, verdict, enrichment_delta=0.0, narrative='')¶
- class veupath_chatbot.services.experiment.types.step_analysis.ParameterSweepPoint(value, positive_hits, negative_hits, total_results, recall, fpr, f1)[source]¶
Bases:
objectOne data point in a parameter sensitivity sweep.
- __init__(value, positive_hits, negative_hits, total_results, recall, fpr, f1)¶
- class veupath_chatbot.services.experiment.types.step_analysis.ParameterSensitivity(step_id, param_name, current_value, sweep_points=<factory>, recommended_value=0.0, recommendation='')[source]¶
Bases:
objectSensitivity sweep for one numeric parameter on one leaf step.
- sweep_points: list[ParameterSweepPoint]¶
- __init__(step_id, param_name, current_value, sweep_points=<factory>, recommended_value=0.0, recommendation='')¶
- class veupath_chatbot.services.experiment.types.step_analysis.StepAnalysisResult(step_evaluations=<factory>, operator_comparisons=<factory>, step_contributions=<factory>, parameter_sensitivities=<factory>)[source]¶
Bases:
objectContainer for all deterministic step analysis results.
- step_evaluations: list[StepEvaluation]¶
- operator_comparisons: list[OperatorComparison]¶
- step_contributions: list[StepContribution]¶
- parameter_sensitivities: list[ParameterSensitivity]¶
- __init__(step_evaluations=<factory>, operator_comparisons=<factory>, step_contributions=<factory>, parameter_sensitivities=<factory>)¶
JSON serialization for experiment dataclasses.
Simple sub-types (metrics, enrichment, rank, step analysis, etc.) are
serialized via the generic to_json converter. Only Experiment and
ExperimentConfig require hand-written logic due to conditional field
inclusion and summary projections.
- veupath_chatbot.services.experiment.types.serialization.experiment_to_json(exp)[source]¶
Serialize a full
Experimentto a JSON-compatible dict.- Return type:
- veupath_chatbot.services.experiment.types.serialization.experiment_summary_to_json(exp)[source]¶
Serialize an experiment to a lightweight summary dict.
- Return type:
Generic dataclass <-> camelCase JSON conversion.
Replaces the hand-written per-type serialization boilerplate with two
generic functions: to_json (serialize) and from_json (deserialize).
Float rounding (default 4 decimal places) can be overridden per-field:
from dataclasses import field
total_time_seconds: float = field(default=0.0, metadata={"round": 2})
p_value: float = field(metadata={"round": None}) # skip rounding
- veupath_chatbot.services.experiment.types.json_codec.to_json(obj, *, _round=4)[source]¶
Serialize a dataclass (or scalar) to a JSON-compatible value.
Dataclass fields are emitted with camelCase keys.
Floats are rounded to _round decimal places (default 4). Override per-field via
field(metadata={"round": N}).Lists, tuples, and dicts are handled recursively.
- Return type:
Seed Data¶
Generate demo experiments with curated multi-step strategies and control sets
across 13 VEuPathDB databases. This is the only place the backend’s
multi-step mode is used. See Services for full seed module reference.