Evaluation Engine

The evaluation engine is the backend service that powers the workbench’s analysis features. It evaluates search performance with positive/negative control gene sets, computes classification and rank metrics, runs cross-validation, and enrichment analysis. The workbench UI at /workbench consumes these endpoints.

        flowchart LR
    A["Gene Set + Controls"] --> G{targetGeneIds?}
    G -- yes --> H["Set Intersection<br/>(no WDK call)"]
    G -- no --> B["Run Search on WDK"]
    B --> C["Evaluate Controls"]
    H --> C
    C --> D["Metrics<br/>P/R/F1"]
    C --> E["Cross-Validation"]
    C --> F["Enrichment"]

    style A fill:#2563eb,color:#fff
    style G fill:#f59e0b,color:#000
    style H fill:#10b981,color:#fff
    style C fill:#7c3aed,color:#fff
    

Evaluation Modes

The evaluation engine supports two evaluation modes:

Gene-ID mode (workbench gene sets):

When targetGeneIds is provided in the experiment config, the engine skips WDK search re-execution and evaluates using pure set intersection against the control genes. This is the correct path for workbench gene sets, which already contain materialized gene IDs.

Search re-execution mode (strategy evaluation):

When targetGeneIds is absent, the engine runs the WDK search using searchName and parameters from the config and evaluates the results against controls. This is the correct path when evaluating a search configuration itself — e.g., when the AI agent builds a strategy and needs to test its performance before the results have been materialized into a gene set.

Important

The benchmark and evaluate panels in the workbench both send targetGeneIds from the active gene set. This ensures metrics are computed against the actual gene set contents, not a potentially stale re-execution of search parameters.

Execution Endpoints

Method

Endpoint

Description

POST

/api/v1/experiments/

Create and run a single experiment (SSE)

POST

/api/v1/experiments/batch

Run across multiple organisms (SSE)

POST

/api/v1/experiments/benchmark

Run against multiple control sets (SSE)

POST

/api/v1/experiments/seed

Seed demo strategies and control sets (SSE)

Analysis Endpoints

Cross-experiment (not scoped to a single experiment):

Method

Endpoint

Description

POST

/api/v1/experiments/overlap

Pairwise gene set overlap (Jaccard, shared/unique genes)

POST

/api/v1/experiments/enrichment-compare

Compare enrichment results across experiments

Per-experiment (scoped to {experiment_id}):

Method

Endpoint

Description

POST

/api/v1/experiments/{id}/cross-validate

Run cross-validation on an existing experiment

POST

/api/v1/experiments/{id}/enrich

Run enrichment analysis

POST

/api/v1/experiments/{id}/re-evaluate

Re-run evaluation (e.g. after changing controls)

POST

/api/v1/experiments/{id}/custom-enrich

Custom enrichment request

POST

/api/v1/experiments/{id}/threshold-sweep

Threshold sweep for a parameter

GET

/api/v1/experiments/{id}/export

Download experiment report (HTML)

CRUD and Results

Method

Endpoint

Description

GET

/api/v1/experiments/

List experiments (optional site filter)

GET

/api/v1/experiments/{id}

Get one experiment

PATCH

/api/v1/experiments/{id}

Update (e.g. name)

DELETE

/api/v1/experiments/{id}

Delete an experiment

Results browsing (per-experiment):

Method

Endpoint

Description

GET

/api/v1/experiments/{id}/results/attributes

List available result attributes

GET

/api/v1/experiments/{id}/results/records

Paginated result records

POST

/api/v1/experiments/{id}/results/record

Get single record detail

GET

/api/v1/experiments/{id}/results/distributions/{attr}

Distribution data for an attribute

POST

/api/v1/experiments/{id}/refine

Refine/filter result records

Workbench chat (per-experiment conversational AI):

Method

Endpoint

Description

POST

/api/v1/experiments/{id}/chat

Start workbench chat stream (SSE)

GET

/api/v1/experiments/{id}/chat/messages

Get chat message history

Persistence

Experiments are stored in the experiments table (see veupath_chatbot.persistence.models.ExperimentRow): id, site_id, name, status, data (full JSON), batch_id, benchmark_id, created_at, updated_at. The experiment store (veupath_chatbot.services.experiment.store) keeps an in-memory cache and persists every mutation to PostgreSQL.

Control Sets

Reusable positive/negative gene sets are managed at /api/v1/control-sets (CRUD). They can be referenced when creating experiments (e.g. control_set_id). See veupath_chatbot.persistence.models.ControlSet.

Experiment Streaming (CQRS)

Purpose: Background task launchers for experiment execution using a CQRS event model. Events are persisted to Redis Streams; operations are tracked in PostgreSQL. This is how long-running experiments (single, batch, benchmark) are kicked off and their progress communicated to the frontend via SSE.

Background task launchers for experiment execution — CQRS version.

Events are persisted to Redis Streams. Operations are registered in PostgreSQL.

async veupath_chatbot.services.experiment.core.streaming.start_experiment(config, *, user_id=None)[source]

Launch a single experiment as a background task. Returns operation ID.

Return type:

str

async veupath_chatbot.services.experiment.core.streaming.start_batch_experiment(batch_config, *, user_id=None)[source]

Launch a batch experiment as a background task. Returns operation ID.

Return type:

str

async veupath_chatbot.services.experiment.core.streaming.start_benchmark(base_config, control_sets, *, user_id=None)[source]

Launch a benchmark suite as a background task. Returns operation ID.

Return type:

str

Service Layer

Core experiment service, orchestration, and store.

Experiment execution orchestrator.

Coordinates the full experiment lifecycle: evaluation, metrics computation, optional cross-validation, and optional enrichment analysis.

Each phase is a private function that mutates experiment and persists intermediate state to the store. The public run_experiment() function orchestrates phase sequencing, lifecycle management, and error handling.

async veupath_chatbot.services.experiment.service.run_experiment(config, *, user_id=None, progress_callback=None)[source]

Execute a full experiment and persist the result.

Parameters:
  • config (ExperimentConfig) – Experiment configuration.

  • user_id (str | None) – Owning user ID (for IDOR protection).

  • progress_callback (Callable[[JSONObject], Awaitable[None]] | None) – Optional async callback for SSE progress events.

Returns:

Completed experiment with all results.

Return type:

Experiment

Experiment store with write-through DB persistence.

Provides CRUD operations for experiment lifecycle management. Keeps an in-memory dict for fast synchronous access during experiment execution, and persists every mutation to PostgreSQL so experiments survive API restarts.

class veupath_chatbot.services.experiment.store.ExperimentStore[source]

Bases: WriteThruStore[Experiment]

Experiment repository with in-memory cache and DB write-through.

Inherits save/get/delete/aget/adelete from WriteThruStore. Adds domain-specific listing methods.

list_all(site_id=None, user_id=None)[source]

List experiments from in-memory cache.

Return type:

list[Experiment]

list_by_benchmark(benchmark_id)[source]

Return all experiments belonging to a benchmark suite (in-memory).

Return type:

list[Experiment]

async alist_all(site_id=None, user_id=None)[source]

List experiments: merges DB rows with in-memory (fresher) state.

Return type:

list[Experiment]

async alist_by_benchmark(benchmark_id)[source]

List experiments by benchmark: merges DB + in-memory.

Return type:

list[Experiment]

veupath_chatbot.services.experiment.store.get_experiment_store()[source]

Get the global experiment store singleton.

Return type:

ExperimentStore

Shared helpers for experiment execution and analysis.

Provides gene-list extraction utilities and the progress callback type alias.

veupath_chatbot.services.experiment.helpers.ProgressCallback

Emits an SSE-friendly progress event dict.

alias of Callable[[JSONObject], Awaitable[None]]

veupath_chatbot.services.experiment.helpers.safe_int(val, default=0)[source]

Safely convert a value to int, returning default on failure.

Return type:

int

veupath_chatbot.services.experiment.helpers.safe_float(val, default=0.0)[source]

Safely convert a value to float, returning default on failure.

Non-finite values (inf, -inf, nan) are replaced with default because they are not JSON-serializable and PostgreSQL rejects them in JSON columns.

Return type:

float

veupath_chatbot.services.experiment.helpers.extract_wdk_id(payload, key='id')[source]

Extract an integer ID from a WDK JSON response.

WDK formatters (StepFormatter, StrategyService, etc.) emit entity IDs as Java longs (always int in JSON) under a known key (typically "id" or "strategyId").

Parameters:
  • payload (object) – WDK response dict.

  • key (str) – JSON key containing the integer ID.

Returns:

The integer ID, or None if not found.

Return type:

int | None

veupath_chatbot.services.experiment.helpers.coerce_step_id(payload)[source]

Extract step ID from a WDK step-creation response.

Parameters:

payload (JSONObject | None) – WDK step-creation response.

Returns:

Step ID.

Raises:

ValueError – If step ID not found.

Return type:

int

async veupath_chatbot.services.experiment.helpers.extract_and_enrich_genes(*, site_id, result, negative_controls=None)[source]

Extract gene lists from a control-test result and enrich with WDK metadata.

Single entry point that replaces duplicated extract + enrich blocks.

Returns:

(true_positive, false_negative, false_positive, true_negative)

Return type:

tuple[list[GeneInfo], list[GeneInfo], list[GeneInfo], list[GeneInfo]]

Deserialize JSON dicts back into Experiment dataclass trees.

Simple sub-types are deserialized via the generic from_json converter. Only Experiment / ExperimentConfig require hand-written logic due to conditional field defaults and enrichment deduplication.

veupath_chatbot.services.experiment._deserialize.experiment_from_json(d)[source]

Reconstruct an Experiment from its JSON representation.

Parameters:

d (dict[str, Any]) – Dict produced by experiment_to_json().

Returns:

Fully hydrated Experiment dataclass.

Return type:

Experiment

WDK strategy materialization for experiments.

Creates, persists, and cleans up WDK strategies from experiment configs, including step tree materialization for multi-step and import modes.

async veupath_chatbot.services.experiment.materialization.cleanup_experiment_strategy(experiment)[source]

Delete the persisted WDK strategy when an experiment is deleted.

Parameters:

experiment (Experiment) – Experiment whose WDK strategy should be cleaned up.

Classification

Purpose: Gene record classification by experiment membership (TP/FP/FN/TN). Adds _classification field to WDK records based on gene ID membership in positive and negative control sets.

Gene record classification by experiment category membership.

Classifies WDK result records as TP / FP / FN / TN based on whether their gene ID appears in the experiment’s curated gene sets. Handles WDK transcript ID version suffixes (e.g. “GENE.1” -> “GENE”).

veupath_chatbot.services.experiment.classification.classify_records(records, tp_ids, fp_ids, fn_ids, tn_ids)[source]

Add _classification field to records based on gene ID membership.

For each record, extracts the primary key and checks membership in the four gene-set categories. WDK transcript IDs may include a version suffix (e.g. "PF3D7_0100100.1"); the function also checks the base ID with the suffix stripped.

Parameters:
  • records (list[JSONObject]) – WDK answer records (list of dicts).

  • tp_ids (set[str]) – True-positive gene IDs.

  • fp_ids (set[str]) – False-positive gene IDs.

  • fn_ids (set[str]) – False-negative gene IDs.

  • tn_ids (set[str]) – True-negative gene IDs.

Returns:

New list of records, each with a _classification field.

Return type:

list[JSONObject]

Evaluation Service

Purpose: Re-evaluation and threshold sweep service. Pure business logic for recomputing experiment metrics with updated controls or parameters.

Evaluation service: re-evaluate and threshold sweep.

Pure business logic extracted from the transport handler. No HTTP/SSE concerns here – callers (routers, tools, etc.) wrap the results in whatever transport format they need.

veupath_chatbot.services.experiment.evaluation.SWEEP_CONCURRENCY = 3

Max parallel WDK control-test runs per sweep.

veupath_chatbot.services.experiment.evaluation.SWEEP_TIMEOUT_S = 240

Server-side timeout for the entire sweep.

veupath_chatbot.services.experiment.evaluation.SWEEP_POINT_TIMEOUT_S = 90

Per-point timeout; prevents one slow point from blocking all.

async veupath_chatbot.services.experiment.evaluation.re_evaluate(exp)[source]

Re-run control evaluation against the (possibly modified) strategy.

Updates the experiment in-place (metrics + gene lists) and persists it. Returns the full experiment JSON.

Return type:

JSONObject

veupath_chatbot.services.experiment.evaluation.compute_sweep_values(*, sweep_type, values, min_value, max_value, steps)[source]

Compute the list of parameter values for a sweep.

Parameters:
  • sweep_type (str) – "numeric" or "categorical".

  • values (list[str] | None) – Explicit values for categorical sweeps.

  • min_value (float | None) – Range start for numeric sweeps.

  • max_value (float | None) – Range end for numeric sweeps.

  • steps (int) – Number of evenly-spaced points for numeric sweeps.

Returns:

List of stringified sweep values.

Raises:

ValidationError – On invalid inputs.

Return type:

list[str]

veupath_chatbot.services.experiment.evaluation.validate_sweep_parameter(exp, param_name)[source]

Ensure param_name exists in the experiment config.

For single-step experiments, checks exp.config.parameters. For tree-mode experiments, walks the step tree looking for the parameter in any leaf node’s parameters dict.

Raises:

ValidationError – If the parameter is missing.

veupath_chatbot.services.experiment.evaluation.format_metrics_dict(m)[source]

Format an ExperimentMetrics into a JSON-friendly dict.

Return type:

JSONObject

async veupath_chatbot.services.experiment.evaluation.run_sweep_point(*, exp, param_name, value, is_categorical)[source]

Run a single sweep point: modify the parameter and evaluate.

For tree-mode experiments, clones the step tree and injects the swept parameter value into every node that contains it, then calls run_controls_against_tree(). For single-step experiments, modifies the flat parameter dict and calls run_positive_negative_controls().

Returns:

Dict with value, metrics (or None), and optionally error.

Return type:

JSONObject

async veupath_chatbot.services.experiment.evaluation.cleanup_before_sweep(site_id)[source]

Best-effort cleanup of leaked internal control-test strategies.

async veupath_chatbot.services.experiment.evaluation.generate_sweep_events(*, exp, param_name, sweep_type, sweep_values)[source]

Run the full sweep and yield SSE-formatted events.

Yields sweep_point events as each point completes, then a final sweep_complete event with all sorted results.

Return type:

AsyncIterator[str]

Metrics and Evaluation

Key Metrics

\[\text{Precision} = \frac{|TP|}{|TP| + |FP|} \qquad \text{Recall} = \frac{|TP|}{|TP| + |FN|} \qquad F_1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}\]

Where \(TP\) = true positives (returned genes in positive controls), \(FP\) = false positives (returned genes in negative controls), \(FN\) = false negatives (positive control genes not returned).

Classification metrics, rank metrics, and statistical utilities.

Metrics engine for computing exhaustive classification metrics.

Computes all standard binary classification metrics from the raw intersection counts returned by run_positive_negative_controls().

veupath_chatbot.services.experiment.metrics.compute_confusion_matrix(*, positive_hits, total_positives, negative_hits, total_negatives)[source]

Derive a confusion matrix from control-test intersection counts.

Parameters:
  • positive_hits (int) – Number of positive controls found in results (TP).

  • total_positives (int) – Total positive controls provided.

  • negative_hits (int) – Number of negative controls found in results (FP).

  • total_negatives (int) – Total negative controls provided.

Returns:

Populated confusion matrix.

Return type:

ConfusionMatrix

veupath_chatbot.services.experiment.metrics.compute_metrics(cm, *, total_results=0)[source]

Compute all classification metrics from a confusion matrix.

Parameters:
  • cm (ConfusionMatrix) – Confusion matrix.

  • total_results (int) – Total number of results returned by the search.

Returns:

Full metrics object.

Return type:

ExperimentMetrics

veupath_chatbot.services.experiment.metrics.evaluate_gene_ids_against_controls(*, gene_ids, positive_controls, negative_controls, site_id='', record_type='')[source]

Evaluate a gene set against controls using pure set intersection.

No WDK calls — the gene set already has its results. Returns the same dict shape that metrics_from_control_result() and extract_and_enrich_genes() consume.

Return type:

JSONObject

veupath_chatbot.services.experiment.metrics.metrics_from_control_result(result)[source]

Build metrics from the dict returned by run_positive_negative_controls().

Parameters:

result (JSONObject) – Raw control-test result dict.

Returns:

Full metrics.

Return type:

ExperimentMetrics

Rank-based evaluation metrics (Precision@K, Recall@K, Enrichment@K).

These metrics treat gene lists as ranked outputs rather than binary classifiers, which better matches how researchers use strategy results (“how many known positives are in my top K?”).

veupath_chatbot.services.experiment.rank_metrics.compute_rank_metrics(result_ids, positive_ids, negative_ids, k_values=None)[source]

Compute rank-based metrics from an ordered result list.

All computation is pure Python — no API calls.

Parameters:
  • result_ids (list[str]) – Ordered gene IDs from the strategy result.

  • positive_ids (set[str]) – Known positive control gene IDs.

  • negative_ids (set[str]) – Known negative control gene IDs (unused for rank metrics but kept for interface consistency).

  • k_values (list[int] | None) – List sizes at which to compute P@K / R@K / E@K.

Returns:

Rank metrics object.

Return type:

RankMetrics

async veupath_chatbot.services.experiment.rank_metrics.fetch_ordered_result_ids(site_id, step_id, max_results=5000, sort_attribute=None, sort_direction='ASC')[source]

Fetch ordered gene IDs from a persisted WDK strategy step.

When sort_attribute is provided the results are sorted by reportConfig.sorting via get_step_records(); otherwise the default WDK ordering is used (via get_step_answer()).

Parameters:
  • site_id (str) – VEuPathDB site ID.

  • step_id (int) – WDK step ID.

  • max_results (int) – Maximum number of IDs to retrieve.

  • sort_attribute (str | None) – WDK attribute name to sort by.

  • sort_direction (str) – "ASC" or "DESC".

Returns:

Ordered list of primary key values.

Return type:

list[str]

Shared statistical utilities for experiment analysis.

veupath_chatbot.services.experiment.stats.hypergeometric_log_sf(x, n, k, m)[source]

Approximate log survival function for hypergeometric distribution.

Uses a normal approximation of P(X >= x) for speed. Returns 0.0 (i.e. p=1.0) when the observed count is at or below the mean.

Parameters

x:

Number of observed successes.

n:

Population size (background).

k:

Number of success states in the population (result set size).

m:

Number of draws (gene set size).

Return type:

float

Analysis Features

Cross-validation, enrichment, overlap, comparison, robustness, and reporting.

K-fold cross-validation for overfitting detection.

Splits positive and negative control gene lists into k folds, evaluates each held-out fold, and aggregates metrics to detect overfitting.

veupath_chatbot.services.experiment.cross_validation.ProgressCallback

Async callback(fold_index, total_folds) for progress reporting.

alias of Callable[[int, int], Coroutine[Any, Any, None]]

veupath_chatbot.services.experiment.cross_validation.FoldEvaluator

Async callback(holdout_pos, holdout_neg) → control-test result dict.

alias of Callable[[list[str] | None, list[str] | None], Coroutine[Any, Any, JSONObject]]

async veupath_chatbot.services.experiment.cross_validation.run_cross_validation(*, site_id, record_type, controls_search_name, controls_param_name, positive_controls, negative_controls, controls_value_format='newline', search_name=None, parameters=None, tree=None, k=5, full_metrics=None, progress_callback=None)[source]

Run k-fold cross-validation on control gene lists.

When tree is provided, evaluates each fold against the full strategy tree. Otherwise, evaluates using the single-step search_name + parameters.

Return type:

CrossValidationResult

Enrichment analysis via WDK step analysis API.

Wraps VEuPathDB’s native GO, pathway, and word enrichment analyses that are available through the step analysis endpoint.

Plugin names (from stepAnalysisPlugins.xml):
  • go-enrichment → GoEnrichmentPlugin

  • pathway-enrichment → PathwaysEnrichmentPlugin

  • word-enrichment → WordEnrichmentPlugin

GO enrichment parameters (from GoEnrichmentPlugin.java):
  • goAssociationsOntologies — “Molecular Function” / etc.

  • goEvidenceCodes — evidence code filter

  • goSubset — GO slim subset

  • pValueCutoff — p-value threshold

  • organism — organism filter

Parameters are fetched from the WDK analysis form defaults so required fields like organism and pValueCutoff are always populated.

veupath_chatbot.services.experiment.enrichment.infer_enrichment_type(wdk_analysis_name, params, result)[source]

Infer the EnrichmentAnalysisType from a WDK analysis name.

For GO enrichment, uses the goAssociationsOntologies parameter or the goOntologies field in the result to determine which GO branch.

Return type:

Literal[‘go_function’, ‘go_component’, ‘go_process’, ‘pathway’, ‘word’]

veupath_chatbot.services.experiment.enrichment.is_enrichment_analysis(wdk_analysis_name)[source]

Return True if the WDK analysis name is an enrichment plugin.

Return type:

bool

veupath_chatbot.services.experiment.enrichment.upsert_enrichment_result(results, new)[source]

Replace an existing result of the same analysis_type, or append.

Mutates results in-place so callers don’t accumulate duplicate tabs when the same enrichment analysis is re-run.

veupath_chatbot.services.experiment.enrichment.parse_enrichment_from_raw(wdk_analysis_name, params, result)[source]

Parse a raw WDK analysis result into an EnrichmentResult.

Used by the generic analyses/run endpoint to return structured enrichment data instead of raw JSON.

Return type:

EnrichmentResult

veupath_chatbot.services.experiment.enrichment.encode_vocab_params(params, form_meta)[source]

Encode vocabulary param values as JSON arrays using form metadata.

WDK’s AbstractEnumParam.convertToTerms() requires all single-pick-vocabulary and multi-pick-vocabulary param values to be JSON-encoded arrays. This function ensures that encoding is applied after merging defaults with user params, so user-supplied plain strings don’t bypass the encoding.

Params whose type is not in the form metadata, or whose type is not a vocabulary type, are returned unchanged.

Return type:

JSONObject

async veupath_chatbot.services.experiment.enrichment.run_enrichment_analysis(*, site_id, record_type, search_name, parameters, analysis_type)[source]

Run a single enrichment analysis on a search result set.

Creates a temporary WDK strategy, runs the analysis, parses results, and cleans up.

Return type:

EnrichmentResult

async veupath_chatbot.services.experiment.enrichment.run_enrichment_on_step(*, site_id, step_id, analysis_type)[source]

Run enrichment on an already-persisted WDK step.

Used for multi-step experiments where the strategy already exists.

Return type:

EnrichmentResult

Custom gene set enrichment analysis against experiment results.

class veupath_chatbot.services.experiment.custom_enrichment.CustomEnrichmentResult[source]

Bases: TypedDict

Return shape of run_custom_enrichment().

geneSetName: str
geneSetSize: int
overlapCount: int
overlapGenes: list[str]
backgroundSize: int
tpCount: int
foldEnrichment: float
pValue: float
oddsRatio: float
veupath_chatbot.services.experiment.custom_enrichment.run_custom_enrichment(exp, gene_ids, gene_set_name)[source]

Test enrichment of a custom gene set against the experiment results.

Computes overlap, fold enrichment, p-value (hypergeometric), and odds ratio.

Return type:

CustomEnrichmentResult

Cross-experiment enrichment comparison.

class veupath_chatbot.services.experiment.enrichment_compare.EnrichmentRow[source]

Bases: TypedDict

Shape of one term row in the enrichment comparison.

termKey: str
termName: str
analysisType: str
scores: dict[str, JSONValue]
maxScore: float
experimentCount: int
class veupath_chatbot.services.experiment.enrichment_compare.EnrichmentCompareResult[source]

Bases: TypedDict

Return shape of compare_enrichment_across().

experimentIds: list[str]
experimentLabels: dict[str, str]
rows: list[EnrichmentRow]
totalTerms: int
veupath_chatbot.services.experiment.enrichment_compare.compare_enrichment_across(experiments, experiment_ids, analysis_type=None)[source]

Compare enrichment results across experiments.

Builds a term-by-experiment matrix of fold-enrichment scores. Optionally filters to a single analysis type.

Return type:

EnrichmentCompareResult

Gene set overlap analysis across experiments.

class veupath_chatbot.services.experiment.overlap.PairwiseOverlap[source]

Bases: TypedDict

Shape of one pairwise comparison entry.

experimentA: str
experimentB: str
labelA: str
labelB: str
sizeA: int
sizeB: int
intersection: int
union: int
jaccard: float
sharedGenes: list[str]
uniqueA: list[str]
uniqueB: list[str]
class veupath_chatbot.services.experiment.overlap.PerExperimentSummary[source]

Bases: TypedDict

Shape of one per-experiment summary entry.

experimentId: str
label: str
totalGenes: int
uniqueGenes: int
sharedGenes: int
class veupath_chatbot.services.experiment.overlap.GeneMembership[source]

Bases: TypedDict

Shape of one gene membership entry.

geneId: str
foundIn: int
totalExperiments: int
experiments: list[str]
class veupath_chatbot.services.experiment.overlap.OverlapResult[source]

Bases: TypedDict

Return shape of compute_gene_set_overlap().

experimentIds: list[str]
experimentLabels: dict[str, str]
pairwise: list[PairwiseOverlap]
perExperiment: list[PerExperimentSummary]
universalGenes: list[str]
totalUniqueGenes: int
geneMembership: list[GeneMembership]
veupath_chatbot.services.experiment.overlap.compute_gene_set_overlap(experiments, experiment_ids)[source]

Compute pairwise gene set overlap between experiments.

For each experiment the result gene set is the union of TP and FP genes. Returns Jaccard similarity, shared/unique genes, and membership counts.

Return type:

OverlapResult

Bootstrap robustness and uncertainty estimation.

Resamples control sets with replacement and recomputes rank metrics to derive confidence intervals and stability scores — all pure Python, no additional WDK API calls required.

veupath_chatbot.services.experiment.robustness.compute_robustness(result_ids, positive_ids, negative_ids, *, n_bootstrap=200, k_values=None, seed=42, alternative_negatives=None, include_rank_metrics=True)[source]

Compute bootstrap confidence intervals for classification (and optionally rank) metrics.

Parameters:
  • result_ids (list[str]) – Ordered gene IDs from the strategy result.

  • positive_ids (list[str]) – Positive control gene IDs.

  • negative_ids (list[str]) – Negative control gene IDs.

  • n_bootstrap (int) – Number of bootstrap iterations.

  • k_values (list[int] | None) – K values for Precision/Recall/Enrichment@K.

  • seed (int) – Random seed for reproducibility.

  • alternative_negatives (dict[str, list[str]] | None) – Optional map of label -> negative IDs for negative-set sensitivity analysis.

  • include_rank_metrics (bool) – When False, skip rank metric CIs and top-K stability — only classification CIs are computed.

Returns:

Bootstrap robustness result.

Return type:

BootstrapResult

Self-contained HTML report generation for experiments.

Generates a single-file HTML document with embedded styles, tables, and inline SVG charts. No external dependencies required.

veupath_chatbot.services.experiment.report.generate_experiment_report(experiment)[source]

Generate a self-contained HTML report for an experiment.

Parameters:

experiment (Experiment) – Full experiment object with results.

Returns:

Complete HTML string.

Return type:

str

Multi-step tree-knob optimization.

Tunes threshold parameters and boolean operators across a strategy tree using Optuna, optimizing for rank-based objectives (Precision@K, Enrichment@K) with optional list-size constraints.

async veupath_chatbot.services.experiment.tree_knobs.optimize_tree_knobs(*, site_id, record_type, base_tree, threshold_knobs, operator_knobs, positive_controls, negative_controls, controls_search_name, controls_param_name, controls_value_format, objective='precision_at_50', budget=50, max_list_size=None)[source]

Run Optuna optimization over tree knobs.

Parameters:
  • base_tree (JSONObject) – PlanStepNode-shaped dict (the template tree).

  • threshold_knobs (list[ThresholdKnob]) – Numeric parameter knobs on leaf steps.

  • operator_knobs (list[OperatorKnob]) – Boolean operator knobs on combine nodes.

  • objective (str) – Target metric name (e.g. precision_at_50).

  • budget (int) – Maximum number of Optuna trials.

  • max_list_size (int | None) – Optional upper bound on result list size.

Returns:

Optimization result with best trial and history.

Return type:

TreeOptimizationResult

AI Analysis

AI-powered analysis helpers and tool definitions.

Helper functions for experiment analysis AI tools.

Utility functions for extracting WDK record data, classifying genes, searching records, and fetching result IDs.

veupath_chatbot.services.experiment.ai_analysis_helpers.classify_gene(gene_id, tp_ids, fp_ids, fn_ids, tn_ids)[source]

Return the classification label for a gene ID.

Parameters:
  • gene_id (str | None) – Gene identifier to classify.

  • tp_ids (set[str]) – True positive gene IDs.

  • fp_ids (set[str]) – False positive gene IDs.

  • fn_ids (set[str]) – False negative gene IDs.

  • tn_ids (set[str]) – True negative gene IDs.

Returns:

One of "TP", "FP", "FN", "TN", or None.

Return type:

str | None

veupath_chatbot.services.experiment.ai_analysis_helpers.record_matches(attrs, query_lower, attribute)[source]

Check if a record’s attributes match a text query.

Parameters:
  • attrs (JSONObject) – Record attribute dict.

  • query_lower (str) – Lowercased search query.

  • attribute (str | None) – Specific attribute to search in, or None for all.

Returns:

True if any matching attribute value is found.

Return type:

bool

async veupath_chatbot.services.experiment.ai_analysis_helpers.build_primary_key(api, site_id, record_type, gene_id)[source]

Build a complete WDK primary key for a gene ID.

WDK requires all primary key columns (e.g. source_id + project_id for gene records). This helper fetches the record type info and fills missing columns from site configuration.

Parameters:
  • api (StrategyAPI) – Strategy API instance.

  • site_id (str) – VEuPathDB site identifier.

  • record_type (str) – WDK record type.

  • gene_id (str) – Gene identifier (the source_id value).

Returns:

List of {name, value} dicts forming the complete PK.

Return type:

list[JSONObject]

async veupath_chatbot.services.experiment.ai_analysis_helpers.fetch_group_records(api, record_type, gene_ids, limit=20, site_id=None)[source]

Fetch records for a list of gene IDs.

Parameters:
  • api (StrategyAPI) – Strategy API instance.

  • record_type (str) – WDK record type.

  • gene_ids (list[str]) – Gene IDs to fetch.

  • limit (int) – Max number of genes to fetch.

  • site_id (str | None) – Site ID for PK completion (fills project_id etc.).

Returns:

List of dicts with geneId and attributes.

Return type:

list[JSONObject]

async veupath_chatbot.services.experiment.ai_analysis_helpers.collect_all_result_ids(api, step_id)[source]

Fetch all result gene IDs from a WDK step by paginating.

Parameters:
  • api (StrategyAPI) – Strategy API instance.

  • step_id (int) – WDK step ID.

Returns:

Set of all gene IDs in the step’s results.

Return type:

set[str]

AI tools for deep experiment result analysis.

Provides function-calling tools that let the AI assistant access experiment data: paginate through records, look up individual genes, get attribute distributions, compare gene groups, and search results.

The agent class is built dynamically via build_analysis_agent_class() so that the services layer never needs a static import from veupath_chatbot.ai. The configured experiment-agent base class is injected at startup.

veupath_chatbot.services.experiment.ai_analysis_tools.configure(*, experiment_agent_cls)[source]

Wire the experiment agent base class.

Called once at application startup from the composition root.

class veupath_chatbot.services.experiment.ai_analysis_tools.ExperimentAnalysisAgent(engine, site_id, experiment_id, system_prompt, chat_history=None)[source]

Bases: RefinementToolsMixin, _AnalysisToolsMixin, Kani

AI agent with data-access and strategy-refinement tools.

Combines analysis tools (data browsing, gene lookup, distributions) with refinement tools (add steps, filter, re-evaluate) and the experiment assistant’s catalog/research tools (inherited via the injected base class).

The base class is set dynamically at startup; if not configured, instantiation falls back to plain Kani.

__init__(engine, site_id, experiment_id, system_prompt, chat_history=None)[source]

Experiment wizard AI assistant — prompt construction and orchestration.

Builds step-specific system prompts, creates a lightweight experiment assistant agent, and streams its response.

AI-layer dependencies (engine factory, agent classes) are injected at startup via configure() so that the services layer never imports from veupath_chatbot.ai.

veupath_chatbot.services.experiment.assistant.configure(*, create_engine_fn, experiment_agent_cls)[source]

Wire AI-layer implementations into the experiment assistant.

Called once at application startup from the composition root.

veupath_chatbot.services.experiment.assistant.build_system_prompt(step, site_id, context)[source]

Build the step-specific system prompt with injected context.

Parameters:
  • step (Literal['search', 'parameters', 'controls', 'run', 'results', 'analysis']) – Current wizard step.

  • site_id (str) – VEuPathDB site identifier.

  • context (JSONObject) – Wizard state (search, params, controls, etc.).

Returns:

Formatted system prompt string.

Return type:

str

async veupath_chatbot.services.experiment.assistant.run_assistant(site_id, step, message, context, history=None, model_override=None, provider_override=None, reasoning_effort=None)[source]

Create an experiment assistant and stream its response.

Parameters:
  • site_id (str) – VEuPathDB site identifier.

  • step (Literal['search', 'parameters', 'controls', 'run', 'results', 'analysis']) – Current wizard step.

  • message (str) – User message.

  • context (JSONObject) – Wizard state context.

  • history (list[JSONObject] | None) – Previous conversation messages.

  • model_override (str | None) – Model catalog ID override (default: openai/gpt-4.1-nano).

  • provider_override (Literal['openai', 'anthropic', 'google', 'ollama', 'mock'] | None) – Provider override.

  • reasoning_effort (Literal['none', 'low', 'medium', 'high'] | None) – Reasoning effort override.

Returns:

Async iterator of SSE-compatible event dicts.

Return type:

AsyncIterator[JSONObject]

AI Refinement Tools

Purpose: AI tools for experiment strategy refinement. Function-calling tools decorated with @ai_function that allow the workbench agent to add search steps, combine results with gene lists, and trigger re-evaluation.

AI tools for experiment strategy refinement.

Provides function-calling tools that let the AI assistant refine the experiment strategy: add new search steps, combine with gene ID lists, and re-evaluate control metrics after refinement.

class veupath_chatbot.services.experiment.ai_refinement_tools.RefinementToolsMixin[source]

Bases: object

Mixin providing strategy-refinement @ai_function methods.

Classes using this mixin must provide: - site_id: str - _get_experiment() -> Experiment | None (async)

site_id: str = ''

Add a new search step and combine it with current experiment results.

Creates a WDK search step, then combines it with the experiment’s current results using the specified boolean operator. The experiment strategy is updated so subsequent queries reflect the refined results. Call re_evaluate_controls afterwards to see the impact on metrics.

Return type:

JSONObject

async refine_with_gene_ids(gene_ids, operator='INTERSECT')[source]

Combine experiment results with a gene ID list.

Creates a gene ID search step using the experiment’s controls search configuration, then combines it with the current results. Use INTERSECT to filter results to only these genes, UNION to add them, or MINUS to exclude them. Call re_evaluate_controls afterwards to see the impact on metrics.

Return type:

JSONObject

async re_evaluate_controls()[source]

Re-run control evaluation against the current (possibly refined) strategy.

Computes updated classification metrics by checking which positive and negative control genes appear in the current result set. Use this after refining the strategy to see the impact on performance.

Return type:

JSONObject

Step Analysis

Multi-step strategy analysis: per-step evaluation, operator comparison, contribution analysis, and parameter sensitivity.

Step decomposition analysis for multi-step strategies.

Replaces the Optuna-based tree optimization with four interpretable analysis phases that give researchers actionable, per-step insights:

  1. Per-step evaluation – evaluate each leaf independently.

  2. Operator comparison – try all operators at each combine node.

  3. Step contribution (ablation) – measure the impact of removing each leaf.

  4. Parameter sensitivity – sweep numeric params across their range.

async veupath_chatbot.services.experiment.step_analysis.run_controls_against_tree(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls=None, negative_controls=None)[source]

Materialise a PlanStepNode tree, intersect with controls, return metrics.

Creates a temporary WDK strategy containing the full tree, adds an intersection step with each control set on top of the root, queries the result counts, then deletes everything.

Returns the same shape as run_positive_negative_controls() so metrics_from_control_result() can consume it directly.

Return type:

JSONObject

async veupath_chatbot.services.experiment.step_analysis.run_step_analysis(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls, negative_controls, baseline_result, phases=None, progress_callback=None)[source]

Run all requested step analysis phases.

Parameters:
  • tree (JSONObject) – PlanStepNode-shaped dict.

  • baseline_result (JSONObject) – Raw result from the initial tree evaluation.

  • phases (list[str] | None) – Which phases to run. Defaults to all four.

Returns:

Aggregated StepAnalysisResult.

Return type:

StepAnalysisResult

Main entry point: run_step_analysis coordinates all four analysis phases.

async veupath_chatbot.services.experiment.step_analysis.orchestrator.run_step_analysis(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls, negative_controls, baseline_result, phases=None, progress_callback=None)[source]

Run all requested step analysis phases.

Parameters:
  • tree (JSONObject) – PlanStepNode-shaped dict.

  • baseline_result (JSONObject) – Raw result from the initial tree evaluation.

  • phases (list[str] | None) – Which phases to run. Defaults to all four.

Returns:

Aggregated StepAnalysisResult.

Return type:

StepAnalysisResult

Phase 1: Per-step evaluation – evaluate each leaf independently.

async veupath_chatbot.services.experiment.step_analysis.phase_step_eval.evaluate_steps(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls, negative_controls, progress_callback=None)[source]

Evaluate each leaf step against controls, preserving ancestor transforms.

For each leaf, the evaluation includes any transform chain above it (e.g. GenesByOrthologs) so that cross-organism searches are converted before being compared against controls.

Parameters:

tree (JSONObject) – PlanStepNode-shaped dict.

Returns:

One StepEvaluation per leaf.

Return type:

list[StepEvaluation]

Phase 2: Operator comparison – try all operators at each combine node.

async veupath_chatbot.services.experiment.step_analysis.phase_operators.compare_operators(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls, negative_controls, progress_callback=None)[source]

For each combine node, evaluate INTERSECT, UNION, MINUS and recommend.

Parameters:

tree (JSONObject) – PlanStepNode-shaped dict.

Returns:

One OperatorComparison per combine node.

Return type:

list[OperatorComparison]

Phase 3: Step contribution (ablation) – measure impact of removing each leaf.

async veupath_chatbot.services.experiment.step_analysis.phase_contribution.analyze_contributions(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls, negative_controls, baseline_metrics, progress_callback=None)[source]

Ablation analysis: remove each leaf and measure the impact.

Parameters:

baseline_metrics (JSONObject) – Metrics from the full tree evaluation.

Returns:

One StepContribution per leaf.

Return type:

list[StepContribution]

Phase 4: Parameter sensitivity – sweep numeric params across their range.

async veupath_chatbot.services.experiment.step_analysis.phase_sensitivity.sweep_parameters(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls, negative_controls, progress_callback=None)[source]

Sweep numeric params on each leaf across their WDK-declared range.

Respects paired min/max bound parameters, deduplicates identical searches across leaves, and only recommends changes when the improvement is meaningful.

Parameters:

tree (JSONObject) – PlanStepNode-shaped dict.

Returns:

One ParameterSensitivity per numeric param.

Return type:

list[ParameterSensitivity]

Control evaluation logic: run trees/steps against control sets and extract metrics.

async veupath_chatbot.services.experiment.step_analysis._evaluation.run_controls_against_tree(*, site_id, record_type, tree, controls_search_name, controls_param_name, controls_value_format, positive_controls=None, negative_controls=None)[source]

Materialise a PlanStepNode tree, intersect with controls, return metrics.

Creates a temporary WDK strategy containing the full tree, adds an intersection step with each control set on top of the root, queries the result counts, then deletes everything.

Returns the same shape as run_positive_negative_controls() so metrics_from_control_result() can consume it directly.

Return type:

JSONObject

Tree traversal and manipulation helpers for step analysis.

Types

Pydantic models for experiment configuration, metrics, enrichment, and results.

Shared data types for the Experiment Lab.

This package consolidates all experiment-related dataclasses, type aliases, and serialization helpers. All public symbols are re-exported here.

class veupath_chatbot.services.experiment.types.ConfusionMatrix(true_positives, false_positives, true_negatives, false_negatives)[source]

Bases: object

2x2 confusion matrix counts.

true_positives: int
false_positives: int
true_negatives: int
false_negatives: int
__init__(true_positives, false_positives, true_negatives, false_negatives)
class veupath_chatbot.services.experiment.types.CrossValidationResult(k, folds, mean_metrics, std_metrics=<factory>, overfitting_score=0.0, overfitting_level='low')[source]

Bases: object

Aggregated cross-validation result.

k: int
folds: list[FoldMetrics]
mean_metrics: ExperimentMetrics
std_metrics: dict[str, float]
overfitting_score: float
overfitting_level: str
__init__(k, folds, mean_metrics, std_metrics=<factory>, overfitting_score=0.0, overfitting_level='low')
class veupath_chatbot.services.experiment.types.ExperimentMetrics(confusion_matrix, sensitivity, specificity, precision, f1_score, mcc, balanced_accuracy, negative_predictive_value=0.0, false_positive_rate=0.0, false_negative_rate=0.0, youdens_j=0.0, total_results=0, total_positives=0, total_negatives=0)[source]

Bases: object

Full classification metrics derived from a confusion matrix.

confusion_matrix: ConfusionMatrix
sensitivity: float
specificity: float
precision: float
f1_score: float
mcc: float
balanced_accuracy: float
negative_predictive_value: float
false_positive_rate: float
false_negative_rate: float
youdens_j: float
total_results: int
total_positives: int
total_negatives: int
__init__(confusion_matrix, sensitivity, specificity, precision, f1_score, mcc, balanced_accuracy, negative_predictive_value=0.0, false_positive_rate=0.0, false_negative_rate=0.0, youdens_j=0.0, total_results=0, total_positives=0, total_negatives=0)
class veupath_chatbot.services.experiment.types.FoldMetrics(fold_index, metrics, positive_control_ids=<factory>, negative_control_ids=<factory>)[source]

Bases: object

Metrics for a single cross-validation fold.

fold_index: int
metrics: ExperimentMetrics
positive_control_ids: list[str]
negative_control_ids: list[str]
__init__(fold_index, metrics, positive_control_ids=<factory>, negative_control_ids=<factory>)
class veupath_chatbot.services.experiment.types.GeneInfo(id, name=None, organism=None, product=None)[source]

Bases: object

Minimal gene metadata.

id: str
name: str | None
organism: str | None
product: str | None
__init__(id, name=None, organism=None, product=None)
class veupath_chatbot.services.experiment.types.EnrichmentResult(analysis_type, terms, total_genes_analyzed=0, background_size=0, error=None)[source]

Bases: object

Results for a single enrichment analysis type.

analysis_type: Literal['go_function', 'go_component', 'go_process', 'pathway', 'word']
terms: list[EnrichmentTerm]
total_genes_analyzed: int
background_size: int
error: str | None
__init__(analysis_type, terms, total_genes_analyzed=0, background_size=0, error=None)
class veupath_chatbot.services.experiment.types.EnrichmentTerm(term_id, term_name, gene_count, background_count, fold_enrichment, odds_ratio, p_value, fdr, bonferroni, genes=<factory>)[source]

Bases: object

Single enriched term from WDK analysis.

term_id: str
term_name: str
gene_count: int
background_count: int
fold_enrichment: float
odds_ratio: float
p_value: float
fdr: float
bonferroni: float
genes: list[str]
__init__(term_id, term_name, gene_count, background_count, fold_enrichment, odds_ratio, p_value, fdr, bonferroni, genes=<factory>)
class veupath_chatbot.services.experiment.types.BootstrapResult(n_iterations=0, metric_cis=<factory>, rank_metric_cis=<factory>, top_k_stability=0.0, negative_set_sensitivity=<factory>)[source]

Bases: object

Robustness assessment via bootstrap resampling.

n_iterations: int
metric_cis: dict[str, ConfidenceInterval]
rank_metric_cis: dict[str, ConfidenceInterval]
top_k_stability: float
negative_set_sensitivity: list[NegativeSetVariant]
__init__(n_iterations=0, metric_cis=<factory>, rank_metric_cis=<factory>, top_k_stability=0.0, negative_set_sensitivity=<factory>)
class veupath_chatbot.services.experiment.types.ConfidenceInterval(lower=0.0, mean=0.0, upper=0.0, std=0.0)[source]

Bases: object

Bootstrap confidence interval for a single metric.

lower: float
mean: float
upper: float
std: float
__init__(lower=0.0, mean=0.0, upper=0.0, std=0.0)
class veupath_chatbot.services.experiment.types.NegativeSetVariant(label, rank_metrics, negative_count=0)[source]

Bases: object

Rank metrics evaluated with an alternative negative control set.

label: str
rank_metrics: RankMetrics
negative_count: int
__init__(label, rank_metrics, negative_count=0)
class veupath_chatbot.services.experiment.types.RankMetrics(precision_at_k=<factory>, recall_at_k=<factory>, enrichment_at_k=<factory>, pr_curve=<factory>, list_size_vs_recall=<factory>, total_results=0)[source]

Bases: object

Rank-based evaluation metrics computed over an ordered result list.

precision_at_k: dict[int, float]
recall_at_k: dict[int, float]
enrichment_at_k: dict[int, float]
pr_curve: list[tuple[float, float]]
list_size_vs_recall: list[tuple[int, float]]
total_results: int
__init__(precision_at_k=<factory>, recall_at_k=<factory>, enrichment_at_k=<factory>, pr_curve=<factory>, list_size_vs_recall=<factory>, total_results=0)
class veupath_chatbot.services.experiment.types.OperatorKnob(combine_node_id, options=<factory>)[source]

Bases: object

A combine-node operator that can be switched during optimization.

combine_node_id: str
options: list[str]
__init__(combine_node_id, options=<factory>)
class veupath_chatbot.services.experiment.types.OptimizationSpec(name, type, min=None, max=None, step=None, choices=None)[source]

Bases: object

Describes a single parameter to optimise.

name: str
type: Literal['numeric', 'integer', 'categorical']
min: float | None
max: float | None
step: float | None
choices: list[str] | None
__init__(name, type, min=None, max=None, step=None, choices=None)
class veupath_chatbot.services.experiment.types.ThresholdKnob(step_id, param_name, min_val, max_val, step_size=None)[source]

Bases: object

A numeric parameter on a leaf step that can be tuned.

step_id: str
param_name: str
min_val: float
max_val: float
step_size: float | None
__init__(step_id, param_name, min_val, max_val, step_size=None)
class veupath_chatbot.services.experiment.types.TreeOptimizationResult(best_trial=None, all_trials=<factory>, total_time_seconds=0.0, objective='')[source]

Bases: object

Result of multi-step tree-knob optimization.

best_trial: TreeOptimizationTrial | None
all_trials: list[TreeOptimizationTrial]
total_time_seconds: float
objective: str
__init__(best_trial=None, all_trials=<factory>, total_time_seconds=0.0, objective='')
class veupath_chatbot.services.experiment.types.TreeOptimizationTrial(trial_number, parameters=<factory>, score=0.0, rank_metrics=None, list_size=0)[source]

Bases: object

One trial during tree-knob optimization.

trial_number: int
parameters: dict[str, float | str]
score: float
rank_metrics: RankMetrics | None
list_size: int
__init__(trial_number, parameters=<factory>, score=0.0, rank_metrics=None, list_size=0)
class veupath_chatbot.services.experiment.types.OperatorComparison(combine_node_id, current_operator, variants=<factory>, recommendation='', recommended_operator='', precision_at_k_delta=<factory>)[source]

Bases: object

Comparison of operators at a single combine node.

combine_node_id: str
current_operator: str
variants: list[OperatorVariant]
recommendation: str
recommended_operator: str
precision_at_k_delta: dict[int, float]
__init__(combine_node_id, current_operator, variants=<factory>, recommendation='', recommended_operator='', precision_at_k_delta=<factory>)
class veupath_chatbot.services.experiment.types.OperatorVariant(operator, positive_hits, negative_hits, total_results, recall, false_positive_rate, f1_score)[source]

Bases: object

Metrics for one boolean operator at a combine node.

operator: str
positive_hits: int
negative_hits: int
total_results: int
recall: float
false_positive_rate: float
f1_score: float
__init__(operator, positive_hits, negative_hits, total_results, recall, false_positive_rate, f1_score)
class veupath_chatbot.services.experiment.types.ParameterSensitivity(step_id, param_name, current_value, sweep_points=<factory>, recommended_value=0.0, recommendation='')[source]

Bases: object

Sensitivity sweep for one numeric parameter on one leaf step.

step_id: str
param_name: str
current_value: float
sweep_points: list[ParameterSweepPoint]
recommended_value: float
recommendation: str
__init__(step_id, param_name, current_value, sweep_points=<factory>, recommended_value=0.0, recommendation='')
class veupath_chatbot.services.experiment.types.ParameterSweepPoint(value, positive_hits, negative_hits, total_results, recall, fpr, f1)[source]

Bases: object

One data point in a parameter sensitivity sweep.

value: float
positive_hits: int
negative_hits: int
total_results: int
recall: float
fpr: float
f1: float
__init__(value, positive_hits, negative_hits, total_results, recall, fpr, f1)
class veupath_chatbot.services.experiment.types.StepAnalysisResult(step_evaluations=<factory>, operator_comparisons=<factory>, step_contributions=<factory>, parameter_sensitivities=<factory>)[source]

Bases: object

Container for all deterministic step analysis results.

step_evaluations: list[StepEvaluation]
operator_comparisons: list[OperatorComparison]
step_contributions: list[StepContribution]
parameter_sensitivities: list[ParameterSensitivity]
__init__(step_evaluations=<factory>, operator_comparisons=<factory>, step_contributions=<factory>, parameter_sensitivities=<factory>)
class veupath_chatbot.services.experiment.types.StepContribution(step_id, search_name, baseline_recall, ablated_recall, recall_delta, baseline_fpr, ablated_fpr, fpr_delta, verdict, enrichment_delta=0.0, narrative='')[source]

Bases: object

Ablation analysis for one leaf step.

step_id: str
search_name: str
baseline_recall: float
ablated_recall: float
recall_delta: float
baseline_fpr: float
ablated_fpr: float
fpr_delta: float
verdict: Literal['essential', 'helpful', 'neutral', 'harmful']
enrichment_delta: float
narrative: str
__init__(step_id, search_name, baseline_recall, ablated_recall, recall_delta, baseline_fpr, ablated_fpr, fpr_delta, verdict, enrichment_delta=0.0, narrative='')
class veupath_chatbot.services.experiment.types.StepEvaluation(step_id, search_name, display_name, result_count, positive_hits, positive_total, negative_hits, negative_total, recall, false_positive_rate, captured_positive_ids=<factory>, captured_negative_ids=<factory>, tp_movement=0, fp_movement=0, fn_movement=0)[source]

Bases: object

Per-leaf-step evaluation against controls.

step_id: str
search_name: str
display_name: str
result_count: int
positive_hits: int
positive_total: int
negative_hits: int
negative_total: int
recall: float
false_positive_rate: float
captured_positive_ids: list[str]
captured_negative_ids: list[str]
tp_movement: int
fp_movement: int
fn_movement: int
__init__(step_id, search_name, display_name, result_count, positive_hits, positive_total, negative_hits, negative_total, recall, false_positive_rate, captured_positive_ids=<factory>, captured_negative_ids=<factory>, tp_movement=0, fp_movement=0, fn_movement=0)
class veupath_chatbot.services.experiment.types.BatchExperimentConfig(base_config, organism_param_name, target_organisms=<factory>)[source]

Bases: object

Configuration for running the same search across multiple organisms.

base_config: ExperimentConfig
organism_param_name: str
target_organisms: list[BatchOrganismTarget]
__init__(base_config, organism_param_name, target_organisms=<factory>)
class veupath_chatbot.services.experiment.types.BatchOrganismTarget(organism, positive_controls=None, negative_controls=None)[source]

Bases: object

Per-organism overrides for a cross-organism batch experiment.

organism: str
positive_controls: list[str] | None
negative_controls: list[str] | None
__init__(organism, positive_controls=None, negative_controls=None)
class veupath_chatbot.services.experiment.types.Experiment(id, config, user_id=None, status='pending', metrics=None, cross_validation=None, enrichment_results=<factory>, true_positive_genes=<factory>, false_negative_genes=<factory>, false_positive_genes=<factory>, true_negative_genes=<factory>, error=None, total_time_seconds=None, created_at='', completed_at=None, batch_id=None, benchmark_id=None, control_set_label=None, is_primary_benchmark=False, optimization_result=None, wdk_strategy_id=None, wdk_step_id=None, notes=None, step_analysis=None, rank_metrics=None, robustness=None, tree_optimization=None)[source]

Bases: object

Full experiment with config and results.

id: str
config: ExperimentConfig
user_id: str | None
status: Literal['pending', 'running', 'completed', 'error', 'cancelled']
metrics: ExperimentMetrics | None
cross_validation: CrossValidationResult | None
enrichment_results: list[EnrichmentResult]
true_positive_genes: list[GeneInfo]
false_negative_genes: list[GeneInfo]
false_positive_genes: list[GeneInfo]
true_negative_genes: list[GeneInfo]
error: str | None
total_time_seconds: float | None
created_at: str
completed_at: str | None
batch_id: str | None
benchmark_id: str | None
control_set_label: str | None
is_primary_benchmark: bool
optimization_result: JSONObject | None
wdk_strategy_id: int | None
wdk_step_id: int | None
notes: str | None
step_analysis: StepAnalysisResult | None
rank_metrics: RankMetrics | None
robustness: BootstrapResult | None
tree_optimization: TreeOptimizationResult | None
classification_id_sets()[source]

Build classification ID sets from the gene lists.

Returns:

(tp_ids, fp_ids, fn_ids, tn_ids)

Return type:

tuple[set[str], set[str], set[str], set[str]]

result_gene_ids()[source]

Return the set of all result gene IDs (TP + FP).

Return type:

set[str]

__init__(id, config, user_id=None, status='pending', metrics=None, cross_validation=None, enrichment_results=<factory>, true_positive_genes=<factory>, false_negative_genes=<factory>, false_positive_genes=<factory>, true_negative_genes=<factory>, error=None, total_time_seconds=None, created_at='', completed_at=None, batch_id=None, benchmark_id=None, control_set_label=None, is_primary_benchmark=False, optimization_result=None, wdk_strategy_id=None, wdk_step_id=None, notes=None, step_analysis=None, rank_metrics=None, robustness=None, tree_optimization=None)
class veupath_chatbot.services.experiment.types.ExperimentConfig(site_id, record_type, search_name, parameters, positive_controls, negative_controls, controls_search_name, controls_param_name, controls_value_format='newline', enable_cross_validation=False, k_folds=5, enrichment_types=<factory>, name='', description='', optimization_specs=None, optimization_budget=30, optimization_objective='balanced_accuracy', parameter_display_values=None, mode='single', step_tree=None, source_strategy_id=None, optimization_target_step=None, enable_step_analysis=False, step_analysis_phases=<factory>, control_set_id=None, threshold_knobs=None, operator_knobs=None, tree_optimization_objective='precision_at_50', tree_optimization_budget=50, max_list_size=None, sort_attribute=None, sort_direction='ASC', parent_experiment_id=None, target_gene_ids=None)[source]

Bases: object

Full configuration for an experiment run.

Supports three modes:

  • single (default): one search + parameters.

  • multi-step: a recursive step_tree of search/combine/transform nodes.

  • import: import an existing Pathfinder strategy by source_strategy_id.

site_id: str
record_type: str
search_name: str
parameters: JSONObject
positive_controls: list[str]
negative_controls: list[str]
controls_search_name: str
controls_param_name: str
controls_value_format: Literal['newline', 'json_list', 'comma']
enable_cross_validation: bool
k_folds: int
enrichment_types: list[Literal['go_function', 'go_component', 'go_process', 'pathway', 'word']]
name: str
description: str
optimization_specs: list[OptimizationSpec] | None
optimization_budget: int
optimization_objective: Literal['f1', 'f_beta', 'recall', 'precision', 'specificity', 'balanced_accuracy', 'mcc', 'youdens_j', 'custom']
parameter_display_values: dict[str, str] | None
mode: Literal['single', 'multi-step', 'import']
step_tree: JSONValue
source_strategy_id: str | None
optimization_target_step: str | None
enable_step_analysis: bool
step_analysis_phases: list[str]
control_set_id: str | None
threshold_knobs: list[ThresholdKnob] | None
operator_knobs: list[OperatorKnob] | None
tree_optimization_objective: str
tree_optimization_budget: int
max_list_size: int | None
sort_attribute: str | None
sort_direction: str
parent_experiment_id: str | None
target_gene_ids: list[str] | None
property is_tree_mode: bool

Whether this config uses a multi-step strategy tree.

__init__(site_id, record_type, search_name, parameters, positive_controls, negative_controls, controls_search_name, controls_param_name, controls_value_format='newline', enable_cross_validation=False, k_folds=5, enrichment_types=<factory>, name='', description='', optimization_specs=None, optimization_budget=30, optimization_objective='balanced_accuracy', parameter_display_values=None, mode='single', step_tree=None, source_strategy_id=None, optimization_target_step=None, enable_step_analysis=False, step_analysis_phases=<factory>, control_set_id=None, threshold_knobs=None, operator_knobs=None, tree_optimization_objective='precision_at_50', tree_optimization_budget=50, max_list_size=None, sort_attribute=None, sort_direction='ASC', parent_experiment_id=None, target_gene_ids=None)
veupath_chatbot.services.experiment.types.experiment_summary_to_json(exp)[source]

Serialize an experiment to a lightweight summary dict.

Return type:

JSONObject

veupath_chatbot.services.experiment.types.experiment_to_json(exp)[source]

Serialize a full Experiment to a JSON-compatible dict.

Return type:

JSONObject

veupath_chatbot.services.experiment.types.from_json(data, cls)[source]

Construct a cls dataclass from a camelCase JSON dict.

Nested dataclasses, lists, dicts, and tuples are coerced using type-hint introspection. Missing keys fall back to field defaults.

Return type:

T

veupath_chatbot.services.experiment.types.to_json(obj, *, _round=4)[source]

Serialize a dataclass (or scalar) to a JSON-compatible value.

  • Dataclass fields are emitted with camelCase keys.

  • Floats are rounded to _round decimal places (default 4). Override per-field via field(metadata={"round": N}).

  • Lists, tuples, and dicts are handled recursively.

Return type:

Any

Experiment and ExperimentConfig dataclasses.

class veupath_chatbot.services.experiment.types.experiment.ExperimentConfig(site_id, record_type, search_name, parameters, positive_controls, negative_controls, controls_search_name, controls_param_name, controls_value_format='newline', enable_cross_validation=False, k_folds=5, enrichment_types=<factory>, name='', description='', optimization_specs=None, optimization_budget=30, optimization_objective='balanced_accuracy', parameter_display_values=None, mode='single', step_tree=None, source_strategy_id=None, optimization_target_step=None, enable_step_analysis=False, step_analysis_phases=<factory>, control_set_id=None, threshold_knobs=None, operator_knobs=None, tree_optimization_objective='precision_at_50', tree_optimization_budget=50, max_list_size=None, sort_attribute=None, sort_direction='ASC', parent_experiment_id=None, target_gene_ids=None)[source]

Bases: object

Full configuration for an experiment run.

Supports three modes:

  • single (default): one search + parameters.

  • multi-step: a recursive step_tree of search/combine/transform nodes.

  • import: import an existing Pathfinder strategy by source_strategy_id.

site_id: str
record_type: str
search_name: str
parameters: JSONObject
positive_controls: list[str]
negative_controls: list[str]
controls_search_name: str
controls_param_name: str
controls_value_format: Literal['newline', 'json_list', 'comma']
enable_cross_validation: bool
k_folds: int
enrichment_types: list[Literal['go_function', 'go_component', 'go_process', 'pathway', 'word']]
name: str
description: str
optimization_specs: list[OptimizationSpec] | None
optimization_budget: int
optimization_objective: Literal['f1', 'f_beta', 'recall', 'precision', 'specificity', 'balanced_accuracy', 'mcc', 'youdens_j', 'custom']
parameter_display_values: dict[str, str] | None
mode: Literal['single', 'multi-step', 'import']
step_tree: JSONValue
source_strategy_id: str | None
optimization_target_step: str | None
enable_step_analysis: bool
step_analysis_phases: list[str]
control_set_id: str | None
threshold_knobs: list[ThresholdKnob] | None
operator_knobs: list[OperatorKnob] | None
tree_optimization_objective: str
tree_optimization_budget: int
max_list_size: int | None
sort_attribute: str | None
sort_direction: str
parent_experiment_id: str | None
target_gene_ids: list[str] | None
property is_tree_mode: bool

Whether this config uses a multi-step strategy tree.

__init__(site_id, record_type, search_name, parameters, positive_controls, negative_controls, controls_search_name, controls_param_name, controls_value_format='newline', enable_cross_validation=False, k_folds=5, enrichment_types=<factory>, name='', description='', optimization_specs=None, optimization_budget=30, optimization_objective='balanced_accuracy', parameter_display_values=None, mode='single', step_tree=None, source_strategy_id=None, optimization_target_step=None, enable_step_analysis=False, step_analysis_phases=<factory>, control_set_id=None, threshold_knobs=None, operator_knobs=None, tree_optimization_objective='precision_at_50', tree_optimization_budget=50, max_list_size=None, sort_attribute=None, sort_direction='ASC', parent_experiment_id=None, target_gene_ids=None)
class veupath_chatbot.services.experiment.types.experiment.BatchOrganismTarget(organism, positive_controls=None, negative_controls=None)[source]

Bases: object

Per-organism overrides for a cross-organism batch experiment.

organism: str
positive_controls: list[str] | None
negative_controls: list[str] | None
__init__(organism, positive_controls=None, negative_controls=None)
class veupath_chatbot.services.experiment.types.experiment.BatchExperimentConfig(base_config, organism_param_name, target_organisms=<factory>)[source]

Bases: object

Configuration for running the same search across multiple organisms.

base_config: ExperimentConfig
organism_param_name: str
target_organisms: list[BatchOrganismTarget]
__init__(base_config, organism_param_name, target_organisms=<factory>)
class veupath_chatbot.services.experiment.types.experiment.Experiment(id, config, user_id=None, status='pending', metrics=None, cross_validation=None, enrichment_results=<factory>, true_positive_genes=<factory>, false_negative_genes=<factory>, false_positive_genes=<factory>, true_negative_genes=<factory>, error=None, total_time_seconds=None, created_at='', completed_at=None, batch_id=None, benchmark_id=None, control_set_label=None, is_primary_benchmark=False, optimization_result=None, wdk_strategy_id=None, wdk_step_id=None, notes=None, step_analysis=None, rank_metrics=None, robustness=None, tree_optimization=None)[source]

Bases: object

Full experiment with config and results.

id: str
config: ExperimentConfig
user_id: str | None
status: Literal['pending', 'running', 'completed', 'error', 'cancelled']
metrics: ExperimentMetrics | None
cross_validation: CrossValidationResult | None
enrichment_results: list[EnrichmentResult]
true_positive_genes: list[GeneInfo]
false_negative_genes: list[GeneInfo]
false_positive_genes: list[GeneInfo]
true_negative_genes: list[GeneInfo]
error: str | None
total_time_seconds: float | None
created_at: str
completed_at: str | None
batch_id: str | None
benchmark_id: str | None
control_set_label: str | None
is_primary_benchmark: bool
optimization_result: JSONObject | None
wdk_strategy_id: int | None
wdk_step_id: int | None
notes: str | None
step_analysis: StepAnalysisResult | None
rank_metrics: RankMetrics | None
robustness: BootstrapResult | None
tree_optimization: TreeOptimizationResult | None
classification_id_sets()[source]

Build classification ID sets from the gene lists.

Returns:

(tp_ids, fp_ids, fn_ids, tn_ids)

Return type:

tuple[set[str], set[str], set[str], set[str]]

result_gene_ids()[source]

Return the set of all result gene IDs (TP + FP).

Return type:

set[str]

__init__(id, config, user_id=None, status='pending', metrics=None, cross_validation=None, enrichment_results=<factory>, true_positive_genes=<factory>, false_negative_genes=<factory>, false_positive_genes=<factory>, true_negative_genes=<factory>, error=None, total_time_seconds=None, created_at='', completed_at=None, batch_id=None, benchmark_id=None, control_set_label=None, is_primary_benchmark=False, optimization_result=None, wdk_strategy_id=None, wdk_step_id=None, notes=None, step_analysis=None, rank_metrics=None, robustness=None, tree_optimization=None)

Core type aliases and Literal types for the Experiment Lab.

Classification metrics dataclasses for the Experiment Lab.

class veupath_chatbot.services.experiment.types.metrics.ConfusionMatrix(true_positives, false_positives, true_negatives, false_negatives)[source]

Bases: object

2x2 confusion matrix counts.

true_positives: int
false_positives: int
true_negatives: int
false_negatives: int
__init__(true_positives, false_positives, true_negatives, false_negatives)
class veupath_chatbot.services.experiment.types.metrics.ExperimentMetrics(confusion_matrix, sensitivity, specificity, precision, f1_score, mcc, balanced_accuracy, negative_predictive_value=0.0, false_positive_rate=0.0, false_negative_rate=0.0, youdens_j=0.0, total_results=0, total_positives=0, total_negatives=0)[source]

Bases: object

Full classification metrics derived from a confusion matrix.

confusion_matrix: ConfusionMatrix
sensitivity: float
specificity: float
precision: float
f1_score: float
mcc: float
balanced_accuracy: float
negative_predictive_value: float
false_positive_rate: float
false_negative_rate: float
youdens_j: float
total_results: int
total_positives: int
total_negatives: int
__init__(confusion_matrix, sensitivity, specificity, precision, f1_score, mcc, balanced_accuracy, negative_predictive_value=0.0, false_positive_rate=0.0, false_negative_rate=0.0, youdens_j=0.0, total_results=0, total_positives=0, total_negatives=0)
class veupath_chatbot.services.experiment.types.metrics.GeneInfo(id, name=None, organism=None, product=None)[source]

Bases: object

Minimal gene metadata.

id: str
name: str | None
organism: str | None
product: str | None
__init__(id, name=None, organism=None, product=None)
class veupath_chatbot.services.experiment.types.metrics.FoldMetrics(fold_index, metrics, positive_control_ids=<factory>, negative_control_ids=<factory>)[source]

Bases: object

Metrics for a single cross-validation fold.

fold_index: int
metrics: ExperimentMetrics
positive_control_ids: list[str]
negative_control_ids: list[str]
__init__(fold_index, metrics, positive_control_ids=<factory>, negative_control_ids=<factory>)
class veupath_chatbot.services.experiment.types.metrics.CrossValidationResult(k, folds, mean_metrics, std_metrics=<factory>, overfitting_score=0.0, overfitting_level='low')[source]

Bases: object

Aggregated cross-validation result.

k: int
folds: list[FoldMetrics]
mean_metrics: ExperimentMetrics
std_metrics: dict[str, float]
overfitting_score: float
overfitting_level: str
__init__(k, folds, mean_metrics, std_metrics=<factory>, overfitting_score=0.0, overfitting_level='low')

Enrichment analysis dataclasses for the Experiment Lab.

class veupath_chatbot.services.experiment.types.enrichment.EnrichmentTerm(term_id, term_name, gene_count, background_count, fold_enrichment, odds_ratio, p_value, fdr, bonferroni, genes=<factory>)[source]

Bases: object

Single enriched term from WDK analysis.

term_id: str
term_name: str
gene_count: int
background_count: int
fold_enrichment: float
odds_ratio: float
p_value: float
fdr: float
bonferroni: float
genes: list[str]
__init__(term_id, term_name, gene_count, background_count, fold_enrichment, odds_ratio, p_value, fdr, bonferroni, genes=<factory>)
class veupath_chatbot.services.experiment.types.enrichment.EnrichmentResult(analysis_type, terms, total_genes_analyzed=0, background_size=0, error=None)[source]

Bases: object

Results for a single enrichment analysis type.

analysis_type: Literal['go_function', 'go_component', 'go_process', 'pathway', 'word']
terms: list[EnrichmentTerm]
total_genes_analyzed: int
background_size: int
error: str | None
__init__(analysis_type, terms, total_genes_analyzed=0, background_size=0, error=None)

Optimization-related dataclasses for the Experiment Lab.

class veupath_chatbot.services.experiment.types.optimization.OptimizationSpec(name, type, min=None, max=None, step=None, choices=None)[source]

Bases: object

Describes a single parameter to optimise.

name: str
type: Literal['numeric', 'integer', 'categorical']
min: float | None
max: float | None
step: float | None
choices: list[str] | None
__init__(name, type, min=None, max=None, step=None, choices=None)
class veupath_chatbot.services.experiment.types.optimization.ThresholdKnob(step_id, param_name, min_val, max_val, step_size=None)[source]

Bases: object

A numeric parameter on a leaf step that can be tuned.

step_id: str
param_name: str
min_val: float
max_val: float
step_size: float | None
__init__(step_id, param_name, min_val, max_val, step_size=None)
class veupath_chatbot.services.experiment.types.optimization.OperatorKnob(combine_node_id, options=<factory>)[source]

Bases: object

A combine-node operator that can be switched during optimization.

combine_node_id: str
options: list[str]
__init__(combine_node_id, options=<factory>)
class veupath_chatbot.services.experiment.types.optimization.TreeOptimizationTrial(trial_number, parameters=<factory>, score=0.0, rank_metrics=None, list_size=0)[source]

Bases: object

One trial during tree-knob optimization.

trial_number: int
parameters: dict[str, float | str]
score: float
rank_metrics: RankMetrics | None
list_size: int
__init__(trial_number, parameters=<factory>, score=0.0, rank_metrics=None, list_size=0)
class veupath_chatbot.services.experiment.types.optimization.TreeOptimizationResult(best_trial=None, all_trials=<factory>, total_time_seconds=0.0, objective='')[source]

Bases: object

Result of multi-step tree-knob optimization.

best_trial: TreeOptimizationTrial | None
all_trials: list[TreeOptimizationTrial]
total_time_seconds: float
objective: str
__init__(best_trial=None, all_trials=<factory>, total_time_seconds=0.0, objective='')

Rank-based evaluation dataclasses for the Experiment Lab.

class veupath_chatbot.services.experiment.types.rank.RankMetrics(precision_at_k=<factory>, recall_at_k=<factory>, enrichment_at_k=<factory>, pr_curve=<factory>, list_size_vs_recall=<factory>, total_results=0)[source]

Bases: object

Rank-based evaluation metrics computed over an ordered result list.

precision_at_k: dict[int, float]
recall_at_k: dict[int, float]
enrichment_at_k: dict[int, float]
pr_curve: list[tuple[float, float]]
list_size_vs_recall: list[tuple[int, float]]
total_results: int
__init__(precision_at_k=<factory>, recall_at_k=<factory>, enrichment_at_k=<factory>, pr_curve=<factory>, list_size_vs_recall=<factory>, total_results=0)
class veupath_chatbot.services.experiment.types.rank.ConfidenceInterval(lower=0.0, mean=0.0, upper=0.0, std=0.0)[source]

Bases: object

Bootstrap confidence interval for a single metric.

lower: float
mean: float
upper: float
std: float
__init__(lower=0.0, mean=0.0, upper=0.0, std=0.0)
class veupath_chatbot.services.experiment.types.rank.NegativeSetVariant(label, rank_metrics, negative_count=0)[source]

Bases: object

Rank metrics evaluated with an alternative negative control set.

label: str
rank_metrics: RankMetrics
negative_count: int
__init__(label, rank_metrics, negative_count=0)
class veupath_chatbot.services.experiment.types.rank.BootstrapResult(n_iterations=0, metric_cis=<factory>, rank_metric_cis=<factory>, top_k_stability=0.0, negative_set_sensitivity=<factory>)[source]

Bases: object

Robustness assessment via bootstrap resampling.

n_iterations: int
metric_cis: dict[str, ConfidenceInterval]
rank_metric_cis: dict[str, ConfidenceInterval]
top_k_stability: float
negative_set_sensitivity: list[NegativeSetVariant]
__init__(n_iterations=0, metric_cis=<factory>, rank_metric_cis=<factory>, top_k_stability=0.0, negative_set_sensitivity=<factory>)

Step analysis dataclasses for multi-step experiment decomposition.

class veupath_chatbot.services.experiment.types.step_analysis.StepEvaluation(step_id, search_name, display_name, result_count, positive_hits, positive_total, negative_hits, negative_total, recall, false_positive_rate, captured_positive_ids=<factory>, captured_negative_ids=<factory>, tp_movement=0, fp_movement=0, fn_movement=0)[source]

Bases: object

Per-leaf-step evaluation against controls.

step_id: str
search_name: str
display_name: str
result_count: int
positive_hits: int
positive_total: int
negative_hits: int
negative_total: int
recall: float
false_positive_rate: float
captured_positive_ids: list[str]
captured_negative_ids: list[str]
tp_movement: int
fp_movement: int
fn_movement: int
__init__(step_id, search_name, display_name, result_count, positive_hits, positive_total, negative_hits, negative_total, recall, false_positive_rate, captured_positive_ids=<factory>, captured_negative_ids=<factory>, tp_movement=0, fp_movement=0, fn_movement=0)
class veupath_chatbot.services.experiment.types.step_analysis.OperatorVariant(operator, positive_hits, negative_hits, total_results, recall, false_positive_rate, f1_score)[source]

Bases: object

Metrics for one boolean operator at a combine node.

operator: str
positive_hits: int
negative_hits: int
total_results: int
recall: float
false_positive_rate: float
f1_score: float
__init__(operator, positive_hits, negative_hits, total_results, recall, false_positive_rate, f1_score)
class veupath_chatbot.services.experiment.types.step_analysis.OperatorComparison(combine_node_id, current_operator, variants=<factory>, recommendation='', recommended_operator='', precision_at_k_delta=<factory>)[source]

Bases: object

Comparison of operators at a single combine node.

combine_node_id: str
current_operator: str
variants: list[OperatorVariant]
recommendation: str
recommended_operator: str
precision_at_k_delta: dict[int, float]
__init__(combine_node_id, current_operator, variants=<factory>, recommendation='', recommended_operator='', precision_at_k_delta=<factory>)
class veupath_chatbot.services.experiment.types.step_analysis.StepContribution(step_id, search_name, baseline_recall, ablated_recall, recall_delta, baseline_fpr, ablated_fpr, fpr_delta, verdict, enrichment_delta=0.0, narrative='')[source]

Bases: object

Ablation analysis for one leaf step.

step_id: str
search_name: str
baseline_recall: float
ablated_recall: float
recall_delta: float
baseline_fpr: float
ablated_fpr: float
fpr_delta: float
verdict: Literal['essential', 'helpful', 'neutral', 'harmful']
enrichment_delta: float
narrative: str
__init__(step_id, search_name, baseline_recall, ablated_recall, recall_delta, baseline_fpr, ablated_fpr, fpr_delta, verdict, enrichment_delta=0.0, narrative='')
class veupath_chatbot.services.experiment.types.step_analysis.ParameterSweepPoint(value, positive_hits, negative_hits, total_results, recall, fpr, f1)[source]

Bases: object

One data point in a parameter sensitivity sweep.

value: float
positive_hits: int
negative_hits: int
total_results: int
recall: float
fpr: float
f1: float
__init__(value, positive_hits, negative_hits, total_results, recall, fpr, f1)
class veupath_chatbot.services.experiment.types.step_analysis.ParameterSensitivity(step_id, param_name, current_value, sweep_points=<factory>, recommended_value=0.0, recommendation='')[source]

Bases: object

Sensitivity sweep for one numeric parameter on one leaf step.

step_id: str
param_name: str
current_value: float
sweep_points: list[ParameterSweepPoint]
recommended_value: float
recommendation: str
__init__(step_id, param_name, current_value, sweep_points=<factory>, recommended_value=0.0, recommendation='')
class veupath_chatbot.services.experiment.types.step_analysis.StepAnalysisResult(step_evaluations=<factory>, operator_comparisons=<factory>, step_contributions=<factory>, parameter_sensitivities=<factory>)[source]

Bases: object

Container for all deterministic step analysis results.

step_evaluations: list[StepEvaluation]
operator_comparisons: list[OperatorComparison]
step_contributions: list[StepContribution]
parameter_sensitivities: list[ParameterSensitivity]
__init__(step_evaluations=<factory>, operator_comparisons=<factory>, step_contributions=<factory>, parameter_sensitivities=<factory>)

JSON serialization for experiment dataclasses.

Simple sub-types (metrics, enrichment, rank, step analysis, etc.) are serialized via the generic to_json converter. Only Experiment and ExperimentConfig require hand-written logic due to conditional field inclusion and summary projections.

veupath_chatbot.services.experiment.types.serialization.experiment_to_json(exp)[source]

Serialize a full Experiment to a JSON-compatible dict.

Return type:

JSONObject

veupath_chatbot.services.experiment.types.serialization.experiment_summary_to_json(exp)[source]

Serialize an experiment to a lightweight summary dict.

Return type:

JSONObject

Generic dataclass <-> camelCase JSON conversion.

Replaces the hand-written per-type serialization boilerplate with two generic functions: to_json (serialize) and from_json (deserialize).

Float rounding (default 4 decimal places) can be overridden per-field:

from dataclasses import field
total_time_seconds: float = field(default=0.0, metadata={"round": 2})
p_value: float = field(metadata={"round": None})  # skip rounding
veupath_chatbot.services.experiment.types.json_codec.to_json(obj, *, _round=4)[source]

Serialize a dataclass (or scalar) to a JSON-compatible value.

  • Dataclass fields are emitted with camelCase keys.

  • Floats are rounded to _round decimal places (default 4). Override per-field via field(metadata={"round": N}).

  • Lists, tuples, and dicts are handled recursively.

Return type:

Any

veupath_chatbot.services.experiment.types.json_codec.from_json(data, cls)[source]

Construct a cls dataclass from a camelCase JSON dict.

Nested dataclasses, lists, dicts, and tuples are coerced using type-hint introspection. Missing keys fall back to field defaults.

Return type:

T

Seed Data

Generate demo experiments with curated multi-step strategies and control sets across 13 VEuPathDB databases. This is the only place the backend’s multi-step mode is used. See Services for full seed module reference.