Services¶

Core business logic for gene lookup, parameter optimization, control tests, and catalog access. Services are stateless and orchestrated by the chat layer.

Overview¶

Gene lookup — Resolve gene names/symbols to VEuPathDB IDs via site-search or the WDK stateless reporter. Used when the user mentions genes from literature.
Parameter optimization — Optimize search parameters against positive/negative control lists using Bayesian optimization (TPE), grid, or random search.
Control tests — Run temporary WDK strategies with known gene lists and compute precision, recall, F1. Used by optimization and validation.
Catalog — Get parameter specs, validate values, list sites/searches.
Strategy session — Load and merge strategy state with conversation messages.

Gene Lookup¶

Purpose: Resolve gene names and IDs via VEuPathDB site-search and the WDK stateless reporter. Used by the agent to validate gene references from literature or user input.

Key functions:

lookup_genes_by_text() — Search by free text (name, symbol, description)
resolve_gene_ids() — Resolve a list of known IDs to full records via WDK

Gene record lookup service.

Provides two complementary lookup strategies:

Text search – uses VEuPathDB site-search (Solr) to find genes by name, symbol, product description, or any free text. Results are filtered to the gene document type so only gene records are returned.
ID resolution – uses the WDK stateless standard reporter endpoint (POST /record-types/{rt}/searches/{search}/reports/standard) to fetch metadata for a list of known gene IDs. Useful for validating IDs or retrieving product names / organisms for IDs obtained from literature.

Both approaches are read-only and do not create steps or strategies.

async veupath_chatbot.services.gene_lookup.lookup_genes_by_text(site_id, query, *, organism=None, offset=0, limit=20)[source]¶

Search for gene records using multiple concurrent strategies.

Parameters:

site_id (str) – VEuPathDB site identifier (e.g. "plasmodb").
query (str) – Free-text query – gene name, symbol, locus tag, or description.
organism (str | None) – Optional organism filter.
offset (int) – Number of results to skip (for pagination).
limit (int) – Maximum number of results to return.

Returns:

Dict with results, totalCount, and optional suggestedOrganisms.

Return type:

JSONObject

async veupath_chatbot.services.gene_lookup.resolve_gene_ids(site_id, gene_ids, *, record_type='transcript', search_name='GeneByLocusTag', param_name='ds_gene_ids', attributes=None)[source]¶

Resolve a list of gene IDs to full records via the WDK standard reporter.

Uses a dedicated short-lived WDK client to guarantee session affinity between dataset creation and the subsequent search. The shared singleton client’s cookie jar is modified by concurrent requests, which can cause the dataset to “not belong” to the search session (WDK tracks anonymous users via session cookies).

Return type:: JSONObject

Main gene text lookup orchestration.

Uses four concurrent strategies to maximise recall, then scores, deduplicates, and ranks results by relevance:

Unrestricted site-search (Solr) – always fires.
Organism-restricted site-search – fires when the query implies an organism.
WDK GenesByText wildcard – fires when query looks like a gene ID prefix.
WDK GenesByText broad – fires when an explicit organism filter is given.

async veupath_chatbot.services.gene_lookup.lookup.lookup_genes_by_text(site_id, query, *, organism=None, offset=0, limit=20)[source]¶

Search for gene records using multiple concurrent strategies.

Parameters:

site_id (str) – VEuPathDB site identifier (e.g. "plasmodb").
query (str) – Free-text query – gene name, symbol, locus tag, or description.
organism (str | None) – Optional organism filter.
offset (int) – Number of results to skip (for pagination).
limit (int) – Maximum number of results to return.

Returns:

Dict with results, totalCount, and optional suggestedOrganisms.

Return type:

JSONObject

Enrich sparse gene results with WDK metadata.

async veupath_chatbot.services.gene_lookup.enrich.enrich_sparse_gene_results(site_id, results, limit)[source]¶

Enrich results that lack organism/product via WDK standard reporter.

Site-search only returns summaryFieldData for fields where the query matched. When a gene matches in literature (e.g. MULTIgene_PubMed), organism/product are absent. We fetch full metadata from the WDK to fill the gaps.

Return type:: list[JSONObject]

Organism fuzzy matching for gene lookup.

veupath_chatbot.services.gene_lookup.organism.score_organism_match(query, organism)[source]¶

Score how well query matches organism (0.0 = no match, 1.0 = exact).

Handles exact match, substring, genus abbreviation (P. falciparum), organism codes (pf3d7), and token-level overlap.

Return type:: float

veupath_chatbot.services.gene_lookup.organism.suggest_organisms(query, available, *, max_suggestions=5, min_score=0.4)[source]¶

Return organism names from available that fuzzy-match query.

Parameters:

query (str) – User’s organism input.
available (list[str]) – List of canonical organism names (from site-search).
max_suggestions (int) – Maximum suggestions to return.
min_score (float) – Minimum match score to include.

Returns:

Suggested organism names, best match first.

Return type:

list[str]

veupath_chatbot.services.gene_lookup.organism.normalize_organism(raw)[source]¶

Clean organism string; handle JSON array format from site-search.

Return type:: str

Gene result building for lookup responses.

veupath_chatbot.services.gene_lookup.result.build_gene_result(*, gene_id, display_name='', organism='', product='', gene_name='', gene_type='', location='', previous_ids='', matched_fields=None)[source]¶

Build a standardised gene result dict.

All gene results – whether from site-search or WDK – are funnelled through this builder so the shape is always consistent.

Return type:: JSONObject

Gene-specific relevance scoring for text search results.

veupath_chatbot.services.gene_lookup.scoring.score_gene_relevance(query, result)[source]¶

Score a gene result’s relevance to query.

Higher is better. The score is an additive combination of how well the query matches the gene ID, gene name, organism, and product, plus a bonus/penalty based on which site-search fields matched.

An extra bonus is awarded when the query exactly matches a descriptive field (product, displayName) so that exact hits always rank above incidental fuzzy overlap from shared tokens like “alpha” or “2”.

Return type:: float

Site-search gene fetching and document parsing.

veupath_chatbot.services.gene_lookup.site_search.parse_site_search_docs(docs)[source]¶

Convert raw site-search documents into standardised gene dicts.

Return type:: list[JSONObject]

async veupath_chatbot.services.gene_lookup.site_search.fetch_site_search_genes(site_id, search_text, *, organisms=None, limit=50)[source]¶

Run a single site-search query and return parsed results.

Returns:: (gene_results, available_organisms, total_count)
Return type:: tuple[list[JSONObject], list[str], int]

WDK-based gene search and ID resolution.

class veupath_chatbot.services.gene_lookup.wdk.WdkTextResult(records, total_count)[source]¶

Bases: object

Results from a WDK GenesByText query.

records: list[JSONObject]¶

total_count: int¶

__init__(records, total_count)¶

async veupath_chatbot.services.gene_lookup.wdk.fetch_wdk_text_genes(site_id, expressions, *, organism=None, text_fields=None, record_type='transcript', limit=50)[source]¶

Search genes via WDK GenesByText.

Return type:: WdkTextResult

async veupath_chatbot.services.gene_lookup.wdk.resolve_gene_ids(site_id, gene_ids, *, record_type='transcript', search_name='GeneByLocusTag', param_name='ds_gene_ids', attributes=None)[source]¶

Resolve a list of gene IDs to full records via the WDK standard reporter.

Return type:: JSONObject

Parameter Optimization¶

Purpose: Optimize search parameters against positive/negative control gene lists using Bayesian optimization (TPE), grid search, or random search. Each trial runs a temporary WDK strategy and scores the result.

Key types: ParameterSpec, OptimizationConfig, OptimizationResult

Parameter optimization for VEuPathDB searches.

Re-exports the public API so existing from veupath_chatbot.services.parameter_optimization import ... statements continue to work unchanged.

class veupath_chatbot.services.parameter_optimization.OptimizationConfig(budget: int = 30, objective: Literal['f1', 'f_beta', 'recall', 'precision', 'specificity', 'balanced_accuracy', 'mcc', 'youdens_j', 'custom'] = 'f1', beta: float = 1.0, recall_weight: float = 1.0, precision_weight: float = 1.0, method: Literal['bayesian', 'grid', 'random'] = 'bayesian', result_count_penalty: float = 0.0)[source]¶

Bases: object

budget: int¶

objective: Literal['f1', 'f_beta', 'recall', 'precision', 'specificity', 'balanced_accuracy', 'mcc', 'youdens_j', 'custom']¶

beta: float¶

recall_weight: float¶

precision_weight: float¶

method: Literal['bayesian', 'grid', 'random']¶

result_count_penalty: float¶: Weight for penalising large result sets. The penalty is result_count_penalty * (result_count / total_genes) where total_genes is the denominator (defaults to 20 000 if unknown). A small value (e.g. 0.1) acts as a tiebreaker; higher values make the optimiser strongly prefer tighter results.

__init__(budget=30, objective='f1', beta=1.0, recall_weight=1.0, precision_weight=1.0, method='bayesian', result_count_penalty=0.0)¶

class veupath_chatbot.services.parameter_optimization.OptimizationResult(optimization_id: str, best_trial: TrialResult | None, all_trials: list[TrialResult], pareto_frontier: list[TrialResult], sensitivity: dict[str, float], total_time_seconds: float, status: str, error_message: str | None = None)[source]¶

Bases: object

optimization_id: str¶

best_trial: TrialResult | None¶

all_trials: list[TrialResult]¶

pareto_frontier: list[TrialResult]¶

sensitivity: dict[str, float]¶

total_time_seconds: float¶

status: str¶

__init__(optimization_id, best_trial, all_trials, pareto_frontier, sensitivity, total_time_seconds, status, error_message=None)¶

error_message: str | None¶

class veupath_chatbot.services.parameter_optimization.ParameterSpec(name, param_type, min_value=None, max_value=None, log_scale=False, step=None, choices=None)[source]¶

Bases: object

Describes a single parameter to optimise.

name: str¶

param_type: Literal['numeric', 'integer', 'categorical']¶

min_value: float | None¶

max_value: float | None¶

log_scale: bool¶

step: float | None¶

choices: list[str] | None¶

__init__(name, param_type, min_value=None, max_value=None, log_scale=False, step=None, choices=None)¶

class veupath_chatbot.services.parameter_optimization.TrialResult(trial_number: int, parameters: dict[str, JSONValue], score: float, recall: float | None, false_positive_rate: float | None, result_count: int | None, positive_hits: int | None = None, negative_hits: int | None = None, total_positives: int | None = None, total_negatives: int | None = None)[source]¶

Bases: object

trial_number: int¶

parameters: dict[str, JSONValue]¶

score: float¶

recall: float | None¶

false_positive_rate: float | None¶

result_count: int | None¶

positive_hits: int | None¶

negative_hits: int | None¶

total_positives: int | None¶

total_negatives: int | None¶

__init__(trial_number, parameters, score, recall, false_positive_rate, result_count, positive_hits=None, negative_hits=None, total_positives=None, total_negatives=None)¶

async veupath_chatbot.services.parameter_optimization.optimize_search_parameters(*, site_id, record_type, search_name, fixed_parameters, parameter_space, controls_search_name, controls_param_name, positive_controls=None, negative_controls=None, controls_value_format='newline', controls_extra_parameters=None, id_field=None, config=None, progress_callback=None, check_cancelled=None)[source]¶

Run parameter optimisation against positive/negative controls.

Returns an OptimizationResult with the best configuration, all trials, Pareto frontier, and sensitivity analysis.

Return type:: OptimizationResult

veupath_chatbot.services.parameter_optimization.result_to_json(result)[source]¶

Return type:: JSONObject

Configuration types for parameter optimization.

Defines the parameter specification, optimization config, trial result, and optimization result dataclasses, as well as type aliases for callbacks.

Parameter optimization for VEuPathDB searches.

Optimizes search parameters against positive/negative control gene lists using Bayesian optimization (TPE sampler via optuna), grid search, or random search. Each “trial” runs a temporary WDK strategy via run_positive_negative_controls() and scores the result.

async veupath_chatbot.services.parameter_optimization.core.optimize_search_parameters(*, site_id, record_type, search_name, fixed_parameters, parameter_space, controls_search_name, controls_param_name, positive_controls=None, negative_controls=None, controls_value_format='newline', controls_extra_parameters=None, id_field=None, config=None, progress_callback=None, check_cancelled=None)[source]¶

Run parameter optimisation against positive/negative controls.

Returns an OptimizationResult with the best configuration, all trials, Pareto frontier, and sensitivity analysis.

Return type:: OptimizationResult

Scoring, analysis, and serialization helpers for parameter optimization.

veupath_chatbot.services.parameter_optimization.scoring.result_to_json(result)[source]¶

Return type:: JSONObject

Trial execution loop for parameter optimization.

class veupath_chatbot.services.parameter_optimization.trials.TrialMetrics(recall, fpr, result_count, positive_hits, negative_hits)[source]¶

Bases: object

Intermediate metrics extracted from a WDK result.

recall: float | None¶

fpr: float | None¶

result_count: int | None¶

positive_hits: int | None¶

negative_hits: int | None¶

__init__(recall, fpr, result_count, positive_hits, negative_hits)¶

class veupath_chatbot.services.parameter_optimization.trials.EarlyStopReason(*values)[source]¶

Bases: Enum

Why the optimisation loop stopped early.

PERFECT_SCORE = 'perfect_score'¶

PLATEAU = 'plateau'¶

async veupath_chatbot.services.parameter_optimization.trials.run_trial_loop(ctx)[source]¶

Execute the full trial loop and return an OptimizationResult.

Return type:: OptimizationResult

Catalog (Parameter Validation)¶

Purpose: Validation of search parameter values. Normalizes, canonicalizes, and validates parameter values against WDK search specs before step creation or strategy execution.

Validation of search parameter values.

async veupath_chatbot.services.catalog.param_validation.validate_search_params(*, site_id, record_type, search_name, context_values)[source]¶

Validate and canonicalize search parameters for UI consumption.

Returns a stable payload:: { “validation”: { “isValid”: bool, “normalizedContextValues”: {…}, “errors”: {…} } }

The goal is to keep the frontend a consumer of backend normalization + validation, without requiring the UI to interpret raw WDK payloads.

Return type:: JSONObject

async veupath_chatbot.services.catalog.param_validation.validate_parameters(*, site_id, record_type, search_name, parameters, resolve_record_type_for_search, find_record_type_hint, extract_vocab_options)[source]¶

Validate parameters against WDK search specs.

Normalizes parameters in-place and raises ValidationError when the search is unknown, extra/unknown parameters are provided, or required parameters are missing.

Export Service¶

Purpose: CSV/TSV/TXT generation and Redis temporary storage for data exports. Generates downloadable files from strategy results, gene sets, and enrichment results, storing them briefly in Redis for client retrieval.

Export service — CSV/TSV/TXT generation + Redis temp storage.

class veupath_chatbot.services.export.service.ExportResult(export_id, filename, content_type, url, size_bytes, expires_in_seconds)[source]¶

Bases: object

Metadata returned after generating an export file.

export_id: str¶

filename: str¶

content_type: str¶

url: str¶

size_bytes: int¶

expires_in_seconds: int¶

__init__(export_id, filename, content_type, url, size_bytes, expires_in_seconds)¶

class veupath_chatbot.services.export.service.ExportService(redis)[source]¶

Bases: object

Generates downloadable files and stores them in Redis with TTL.

__init__(redis)[source]¶

async get_export(export_id)[source]¶

Retrieve stored export. Returns (content, filename, content_type) or None.

Return type:: tuple[bytes, str, str] | None

async export_gene_set(gene_set, format)[source]¶

Export a gene set as CSV or TXT.

Return type:: ExportResult

async export_enrichment(results, name)[source]¶

Export enrichment results as CSV.

Return type:: ExportResult

async export_enrichment_tsv(results, name)[source]¶

Export enrichment results as TSV.

Return type:: ExportResult

async export_enrichment_json(results, name)[source]¶

Export enrichment results as JSON.

Return type:: ExportResult

async export_json(data, name)[source]¶

Export arbitrary data as JSON.

Return type:: ExportResult

async export_experiment_results(experiment, format)[source]¶

Export experiment gene classifications as CSV or TSV.

Return type:: ExportResult

SSE progress callbacks for parameter optimization.

async veupath_chatbot.services.parameter_optimization.callbacks.emit_started(callback, *, optimization_id, search_name, record_type, budget, objective, positive_controls_count, negative_controls_count, param_space_json)[source]¶

async veupath_chatbot.services.parameter_optimization.callbacks.emit_trial_progress(callback, *, optimization_id, trial_num, budget, trial_json, best_trial, recent_trials)[source]¶

async veupath_chatbot.services.parameter_optimization.callbacks.emit_error(callback, *, optimization_id, error)[source]¶

async veupath_chatbot.services.parameter_optimization.callbacks.emit_completed(callback, *, optimization_id, status, budget, trials, best_trial, pareto, sensitivity, elapsed)[source]¶

Control Tests¶

Purpose: Run positive/negative control gene lists against a WDK strategy and compute precision, recall, F1, and related metrics. Used by parameter optimization and for validation.

Key function: run_positive_negative_controls()

Positive/negative control test helpers for planning mode.

These helpers run temporary WDK steps/strategies to evaluate whether known positive controls are returned and known negative controls are excluded.

async veupath_chatbot.services.control_tests.resolve_controls_param_type(api, record_type, controls_search_name, controls_param_name)[source]¶

Return the WDK param type for a controls parameter.

Parameters:

api (StrategyAPI) – Strategy API instance.
record_type (str) – WDK record type.
controls_search_name (str) – Name of the controls search.
controls_param_name (str) – Parameter name within the controls search.

Returns:

Parameter type string (e.g. "input-dataset") or None.

Return type:

str | None

async veupath_chatbot.services.control_tests.run_positive_negative_controls(*, site_id, record_type, target_search_name, target_parameters, controls_search_name, controls_param_name, positive_controls=None, negative_controls=None, controls_value_format='newline', controls_extra_parameters=None, id_field=None, skip_cleanup=False)[source]¶

Run positive + negative controls against a single WDK question configuration.

Each control set (positive / negative) creates its own target step internally. WDK cascade-deletes all steps inside a strategy when the strategy is deleted, so a shared target step would be invalidated after the first control run’s cleanup.

Parameters:: skip_cleanup (bool) – When True, skip the upfront strategy cleanup. Useful when the caller already performed cleanup (e.g. batch sweeps).
Return type:: JSONObject

Control Helpers¶

Purpose: Formatting and parsing utilities for control test evaluation. Encodes gene ID lists in various formats (newline, comma, JSON) and handles temporary strategy cleanup.

Formatting and parsing utilities for control-test evaluation.

async veupath_chatbot.services.control_helpers.delete_temp_strategy(api, strategy_id)[source]¶

Best-effort deletion of a temporary WDK strategy.

Silently logs and swallows errors — callers should use this in finally blocks to avoid masking the original exception.

async veupath_chatbot.services.control_helpers.cleanup_internal_control_test_strategies(api, wdk_items, *, site_id='')[source]¶

Delete leaked internal control-test strategies from a WDK item list.

Callers fetch the item list themselves (via api.list_strategies()), then pass it here for cleanup.

Search Reranking¶

Purpose: Reusable “fetch wide, rerank narrow” pattern for search results. Robust fuzzy matching with exactness bonuses for gene ID lookups. Used to improve relevance of WDK search results.

Reusable search result reranking utilities.

Implements a “fetch wide, rerank narrow” pattern for VEuPathDB search:

Analyse the query to detect intent (gene ID prefix, organism abbreviation, free text, etc.)
Fetch broadly from one or more sources (site-search, WDK).
Score each result on multiple relevance signals.
Deduplicate by primary key, keeping the highest-scored entry.
Return the top-N results sorted by combined score.

veupath_chatbot.services.search_rerank.score_text_match(query, value)[source]¶

Score how well query matches value (0.0–1.0).

Uses rapidfuzz for robust fuzzy matching, with bonuses for exact and prefix matches that are critical for gene ID lookups.

Return type:: float

veupath_chatbot.services.search_rerank.score_field_quality(matched_fields)[source]¶

Score based on which fields the query matched in.

Return type:: float

class veupath_chatbot.services.search_rerank.ScoredResult(result, score, source='')[source]¶

Bases: object

A search result with an attached relevance score.

result: JSONObject¶

score: float¶

source: str = ''¶

__init__(result, score, source='')¶

veupath_chatbot.services.search_rerank.dedup_and_sort(results, key_fn)[source]¶

Deduplicate results by key, keeping the highest-scoring entry.

Return type:: list[ScoredResult]

class veupath_chatbot.services.search_rerank.QueryIntent(raw, is_gene_id_like=False, implied_organism=None, implied_organism_score=0.0, wildcard_ids=())[source]¶

Bases: object

What we think the user is looking for.

raw: str¶

is_gene_id_like: bool = False¶

implied_organism: str | None = None¶

implied_organism_score: float = 0.0¶

wildcard_ids: tuple[str, ...] = ()¶

__init__(raw, is_gene_id_like=False, implied_organism=None, implied_organism_score=0.0, wildcard_ids=())¶

veupath_chatbot.services.search_rerank.analyse_query(query, available_organisms, organism_scorer=None)[source]¶

Analyse a query string to detect search intent.

Parameters:

query (str) – User’s raw search text.
available_organisms (list[str]) – Canonical organism names from the site.
organism_scorer (Callable[[str, str], float] | None) – A (query, organism) -> float scorer.

Returns:

A QueryIntent describing what the user likely wants.

Return type:

QueryIntent

Catalog (Parameters & Searches)¶

Purpose: Retrieve and validate search parameters from VEuPathDB. Handles parameter specs, dependent vocabularies, and search details. Used by tools when the agent needs to discover or validate parameters.

Key functions:

get_search_parameters() — Full parameter specs for a search
validate_search_params() — Validate parameter values
get_refreshed_dependent_params() — Refresh dependent parameter options

Search parameter retrieval, validation, and expansion functions.

async veupath_chatbot.services.catalog.parameters.expand_search_details_with_params(site_id, record_type, search_name, context_values)[source]¶

Return WDK search details after applying (WDK-wire) context values.

NOTE: despite the historical name, this is not a pure validation API; it returns WDK search details payload. Keep it separate from the public validation endpoint.

Return type:: JSONObject

async veupath_chatbot.services.catalog.parameters.get_refreshed_dependent_params(*, site_id, record_type, search_name, parameter_name, context_values)[source]¶

Get refreshed dependent parameter vocabulary, falling back to the portal.

Tries the site-specific WDK client first. If that fails with a WDKError and the site is not already veupathdb, retries against the portal client (veupathdb).

Parameters:

site_id (str) – Site identifier.
record_type (str) – WDK record type.
search_name (str) – WDK search name.
parameter_name (str) – The dependent parameter to refresh.
context_values (JSONObject) – Current context parameter values.

Returns:

Refreshed dependent param payload from WDK.

Return type:

JSONObject

async veupath_chatbot.services.catalog.parameters.get_search_parameters(site_id, record_type, search_name)[source]¶

Get detailed parameter info for a specific search.

This is intentionally defensive: WDK responses can vary by site/endpoint.

Return type:: JSONObject

async veupath_chatbot.services.catalog.parameters.get_search_parameters_tool(site_id, record_type, search_name)[source]¶

Tool-friendly wrapper that returns standardized tool_error payloads.

Return type:: JSONObject

async veupath_chatbot.services.catalog.parameters.lookup_phyletic_codes(site_id, record_type, query)[source]¶

Search phyletic species codes by name for the GenesByOrthologPattern search.

Returns matching {code, label} pairs from the phyletic_term_map vocabulary. The model uses codes to build profile_pattern values.

Parameters:

site_id (str) – Site ID.
record_type (str) – Record type (usually “transcript”).
query (str) – Species/clade name search term (case-insensitive substring).

Returns:

Dict with matches list and query echo.

Return type:

JSONObject

async veupath_chatbot.services.catalog.parameters.validate_search_params(*, site_id, record_type, search_name, context_values)[source]¶

Validate and canonicalize search parameters for UI consumption.

Returns a stable payload:: { “validation”: { “isValid”: bool, “normalizedContextValues”: {…}, “errors”: {…} } }

The goal is to keep the frontend a consumer of backend normalization + validation, without requiring the UI to interpret raw WDK payloads.

Return type:: JSONObject

Catalog (Sites & Record Types)¶

Purpose: Sites, record types, and search listing. Entry point for discovery.

Sites and record types catalog functions.

async veupath_chatbot.services.catalog.sites.list_sites()[source]¶

List all available VEuPathDB sites.

Return type:: JSONArray

async veupath_chatbot.services.catalog.sites.get_record_types(site_id)[source]¶

Get record types for a specific site.

Return type:: JSONArray

Search listing and searching functions.

veupath_chatbot.services.catalog.searches.score_search(*, query_terms, keywords, search_name, display_name, description, corpus_doc_count=1, corpus_term_counts=None)[source]¶

Score a search against query terms and keywords.

Keywords matched against searchName via substring → +KEYWORD_BOOST each.
Query terms matched per field with field weight × IDF.
Short terms (< _MIN_TERM_LEN chars) ignored in query matching.

Return type:: float

veupath_chatbot.services.catalog.searches.is_chooser_search(search)[source]¶

Return True if this is a routing/chooser search (no real params).

Chooser searches have websiteProperties: ["hideOperation"] and/or empty paramNames. The search list endpoint returns paramNames (list of strings), not full parameters objects.

Return type:: bool

veupath_chatbot.services.catalog.searches.annotate_search(search)[source]¶

Add category and returns fields to a search result dict.

Return type:: dict[str, str]

async veupath_chatbot.services.catalog.searches.get_raw_record_types(site_id)[source]¶

Return raw WDK record type objects for a site.

Unlike services.catalog.sites.get_record_types(), this preserves the full WDK payloads (urlSegment, name, displayName, etc.) so that callers needing the original structure don’t have to go through the integrations layer directly.

Return type:: JSONArray

async veupath_chatbot.services.catalog.searches.get_raw_searches(site_id, record_type)[source]¶

Return raw WDK search objects for a record type.

Thin service-level wrapper over the discovery integration so that AI tools and other service consumers never import from integrations/ directly.

Return type:: JSONArray

async veupath_chatbot.services.catalog.searches.list_searches(site_id, record_type)[source]¶

List searches for a specific record type.

Returns name + displayName only to keep the payload small (VEuPathDB has 2000+ searches; descriptions alone add ~3 MB). The model should use search_for_searches for targeted discovery with descriptions, or get_search_parameters for full details on a specific search.

Return type:: list[dict[str, str]]

async veupath_chatbot.services.catalog.searches.list_transforms(site_id, record_type)[source]¶

List transform/combine searches (with descriptions).

Returns only searches that accept an input step — these are used to chain steps together (ortholog transform, weight filter, span logic, boolean combine, etc.). Typically 5-7 per site, so descriptions are included.

Return type:: list[dict[str, str]]

async veupath_chatbot.services.catalog.searches.search_for_searches(site_id, record_type, query, *, keywords=None, limit=20)[source]¶

Find searches matching a query and/or keywords.

Uses field-weighted scoring with IDF, keyword boosting against search names, chooser filtering, and result annotation. Site-search results are merged in when available.

Return type:: list[dict[str, str]]

async veupath_chatbot.services.catalog.searches.find_record_type_for_search(site_id, record_type, search_name)[source]¶

Resolve which record type actually contains a search name.

Uses the pre-cached SearchCatalog (mirrors WDK’s global getQuestionByName() lookup) — no HTTP calls at resolve time. Falls back to record_type when the search isn’t found.

Return type:: str

async veupath_chatbot.services.catalog.searches.make_record_type_resolver(site_id)[source]¶

Create a record type resolver backed by the pre-cached SearchCatalog.

Mirrors WDK’s WdkModel.getQuestionByName() — a global lookup that finds which record type owns a given search name, using the already-cached catalog data (no HTTP calls at resolve time).

Return type:: Callable[[str], Awaitable[str | None]]

async veupath_chatbot.services.catalog.searches.resolve_record_type_from_steps(root_step, resolver)[source]¶

Resolve record type from the first resolvable leaf search in a step tree.

Uses collect_plan_leaves() to find leaf (search) nodes, then calls the resolver to find the owning record type for the first one that resolves.

Return type:: str | None

Catalog (Parameter Resolution)¶

Purpose: WDK parameter fetching, caching, and vocabulary expansion. Resolves search parameter specs with allowed values, handles dependent vocabularies, and flattens nested parameter structures for agent consumption.

WDK parameter fetching, caching, and expansion.

async veupath_chatbot.services.catalog.param_resolution.get_search_parameters(site_id, record_type, search_name)[source]¶

Get detailed parameter info for a specific search.

This is intentionally defensive: WDK responses can vary by site/endpoint.

Return type:: JSONObject

async veupath_chatbot.services.catalog.param_resolution.get_search_parameters_tool(site_id, record_type, search_name)[source]¶

Tool-friendly wrapper that returns standardized tool_error payloads.

Return type:: JSONObject

async veupath_chatbot.services.catalog.param_resolution.lookup_phyletic_codes(site_id, record_type, query)[source]¶

Search phyletic species codes by name for the GenesByOrthologPattern search.

Returns matching {code, label} pairs from the phyletic_term_map vocabulary. The model uses codes to build profile_pattern values.

Parameters:

site_id (str) – Site ID.
record_type (str) – Record type (usually “transcript”).
query (str) – Species/clade name search term (case-insensitive substring).

Returns:

Dict with matches list and query echo.

Return type:

JSONObject

async veupath_chatbot.services.catalog.param_resolution.expand_search_details_with_params(site_id, record_type, search_name, context_values)[source]¶

Return WDK search details after applying (WDK-wire) context values.

NOTE: despite the historical name, this is not a pure validation API; it returns WDK search details payload. Keep it separate from the public validation endpoint.

Return type:: JSONObject

async veupath_chatbot.services.catalog.param_resolution.get_refreshed_dependent_params(*, site_id, record_type, search_name, parameter_name, context_values)[source]¶

Get refreshed dependent parameter vocabulary, falling back to the portal.

Tries the site-specific WDK client first. If that fails with a WDKError and the site is not already veupathdb, retries against the portal client (veupathdb).

Parameters:

site_id (str) – Site identifier.
record_type (str) – WDK record type.
search_name (str) – WDK search name.
parameter_name (str) – The dependent parameter to refresh.
context_values (JSONObject) – Current context parameter values.

Returns:

Refreshed dependent param payload from WDK.

Return type:

JSONObject

Catalog (RAG Search)¶

Purpose: RAG search orchestration — embed query → search Qdrant → threshold → prune results. Centralizes the shared RAG pipeline used by catalog and example-plan tools.

RAG search service: embed -> query Qdrant -> threshold -> prune.

Centralises the shared pattern used by catalog and example-plan RAG tools so that the AI tool layer never touches integrations directly.

class veupath_chatbot.services.catalog.rag_search.RagSearchService(*, site_id, store=None)[source]¶

Bases: object

Stateless service encapsulating all Qdrant-backed lookups.

Constructed with a site_id; owns its own QdrantStore instance.

__init__(*, site_id, store=None)[source]¶

async search_record_types(query=None, limit=20, min_score=0.4)[source]¶

Semantic search over WDK record types.

Return type:: JSONArray

async get_record_type_details(record_type_id)[source]¶

Retrieve one record-type payload from Qdrant by id.

Return type:: JSONObject | None

async search_for_searches(query, record_type=None, limit=20, min_score=0.4)[source]¶

Semantic search over WDK searches.

Return type:: JSONArray

async get_search_metadata(record_type, search_name)[source]¶

Retrieve one search payload from Qdrant by composite key.

Return type:: JSONObject | None

async get_dependent_vocab(record_type, search_name, param_name, context_values=None)[source]¶

Fetch dependent vocab (Qdrant-cached, WDK fallback on miss).

Return type:: JSONObject

async search_example_plans(query, limit=5)[source]¶

Semantic search over ingested public strategies.

Return type:: JSONArray

async get_search_details(record_type, search_name, *, expand_params=True)[source]¶

Proxy to DiscoveryService.get_search_details for dependent-vocab fallbacks.

Return type:: JSONObject

Strategy Session¶

Purpose: Load and merge strategy state with conversation messages. Used when switching strategies or restoring sessions.

Stateful strategy session types (in-memory).

These types model the working state while a user (or an AI agent) is building a VEuPathDB strategy during a chat session.

class veupath_chatbot.domain.strategy.session.StrategyGraph(graph_id, name, site_id)[source]

Bases: object

State for a single strategy graph.

__init__(graph_id, name, site_id)[source]

invalidate_build()[source]

Clear WDK build state so stale counts are not shown.

Call after any mutation that changes step semantics (parameters, search_name, operator, delete). The next build_strategy call will re-populate step_counts and wdk_step_ids.

add_step(step)[source]

Add a step and maintain the subtree-root set.

The new step becomes a root. If it consumes existing roots as primary_input or secondary_input, those are removed from the root set (they are now internal nodes of the new step’s subtree).

Parameters:: step (PlanStepNode) – Step to add.
Returns:: Step ID.
Return type:: str

get_step(step_id)[source]

Get a step by ID.

Parameters:: step_id (str) – Step ID.
Returns:: Step or None.
Return type:: PlanStepNode | None

recompute_roots()[source]

Recompute roots from the current steps dict.

A root is any step that is not referenced as the primary_input or secondary_input of another step. Call this after bulk mutations (delete, hydration) where incremental root tracking is impractical.

save_history(description)[source]

Save current state to history.

Parameters:: description (str) – Description of the state.

undo()[source]

Undo to previous state.

Restores current_strategy and the derived graph state (steps, roots, last_step_id) so that tools that inspect the step graph see a consistent picture after undo.

Return type:: bool

class veupath_chatbot.domain.strategy.session.StrategySession(site_id)[source]

Bases: object

Session context for the active strategy (graph + chat).

__init__(site_id)[source]

add_graph(graph)[source]

Parameters:: graph (StrategyGraph) – Strategy graph to register.

create_graph(name, graph_id=None)[source]

Create a new empty graph and register it.

Parameters:

name (str) – Graph name.
graph_id (str | None) – Optional graph ID (default: None).

Returns:

The graph.

Return type:

StrategyGraph

get_graph(graph_id)[source]

Get graph by ID (or active graph if None).

Parameters:: graph_id (str | None) – Graph ID, or None for active graph.
Returns:: Graph or None.
Return type:: StrategyGraph | None

veupath_chatbot.domain.strategy.session.hydrate_graph_from_steps_data(graph, steps_data, *, root_step_id=None, record_type=None)[source]

Hydrate an in-memory graph from persisted flat steps.

This is used when we have a persisted steps list (and maybe root_step_id) but no canonical plan to parse into an AST. It enables tools like list_current_steps to reflect existing UI-visible nodes.

Accepts arbitrary input; non-list values are silently ignored.

Parameters:

graph (StrategyGraph) – Strategy graph to hydrate.
steps_data (JSONArray | object) – Flat steps list from persistence (or any value).
root_step_id (str | None) – Root step ID (default: None).
record_type (str | None) – Record type (default: None).

Experiment Seed Data¶

Purpose: Generate demo experiments with pre-built multi-step strategies and control sets across 13 VEuPathDB databases. Seeds use multi-step mode internally to create strategy trees (the only place multi-step mode is used). Triggered via POST /api/v1/experiments/seed or the Settings > Seeding UI.

Each database has curated seed definitions with organism-specific searches, known positive/negative gene controls, and step trees that demonstrate real research workflows (e.g. drug resistance genes in PlasmoDB, virulence factors in TriTrypDB).

Seed strategy and control-set definitions.

Re-exports the public API so existing from veupath_chatbot.services.experiment.seed import ... statements continue to work unchanged.

class veupath_chatbot.services.experiment.seed.ControlSetDef(name: str, positive_ids: list[str], negative_ids: list[str], provenance_notes: str, tags: list[str] = <factory>)[source]¶

Bases: object

name: str¶

positive_ids: list[str]¶

negative_ids: list[str]¶

provenance_notes: str¶

tags: list[str]¶

__init__(name, positive_ids, negative_ids, provenance_notes, tags=<factory>)¶

class veupath_chatbot.services.experiment.seed.SeedDef(name: str, description: str, site_id: str, step_tree: dict[str, Any], control_set: ControlSetDef, record_type: str = 'transcript')[source]¶

Bases: object

name: str¶

description: str¶

site_id: str¶

step_tree: dict[str, Any]¶

control_set: ControlSetDef¶

record_type: str = 'transcript'¶

__init__(name, description, site_id, step_tree, control_set, record_type='transcript')¶

veupath_chatbot.services.experiment.seed.get_all_seeds()[source]¶

Get seeds for all available sites.

Return type:: list[Any]

veupath_chatbot.services.experiment.seed.get_seeds_for_site(site_id)[source]¶

Import and return SEEDS for a specific site.

Return type:: list[Any]

async veupath_chatbot.services.experiment.seed.run_seed(*, user_id, stream_repo, control_set_repo, site_id=None)[source]¶

Create seed strategies and control sets, yielding SSE progress events.

Seeds run concurrently (up to _MAX_CONCURRENT_SEEDS at a time). Progress events are streamed back via an asyncio.Queue.

If site_id is provided, only seeds for that database are created. Otherwise all available seeds are used.

Imports are deferred to avoid circular dependencies.

Return type:: AsyncIterator[JSONObject]

Seed strategy runner.

Creates real WDK strategies (visible in the sidebar) and curated control sets (available in the Experiments tab) across multiple VEuPathDB sites.

Seeds are processed concurrently across sites using asyncio.TaskGroup, with a semaphore to cap the number of parallel WDK requests.

async veupath_chatbot.services.experiment.seed.runner.run_seed(*, user_id, stream_repo, control_set_repo, site_id=None)[source]¶

Create seed strategies and control sets, yielding SSE progress events.

Seeds run concurrently (up to _MAX_CONCURRENT_SEEDS at a time). Progress events are streamed back via an asyncio.Queue.

If site_id is provided, only seeds for that database are created. Otherwise all available seeds are used.

Imports are deferred to avoid circular dependencies.

Return type:: AsyncIterator[JSONObject]

Shared parameter-building helpers for seed definitions.

Every VEuPathDB component site seed file needs to build WDK search parameter dicts. These helpers encode the common patterns — organism JSON encoding, GO term searches, text searches, signal peptide, transmembrane domains, etc.

Each seed file may still define site-specific helpers locally.

veupath_chatbot.services.experiment.seed.helpers.org(names)[source]¶

Encode an organism name list as a WDK JSON-array string.

Return type:: str

veupath_chatbot.services.experiment.seed.helpers.go_search_params(organism, go_id, *, evidence=None, go_term_value=None)[source]¶

Build GenesByGoTerm search parameters.

Parameters:

organism (str) – Organism full name (e.g. “Plasmodium falciparum 3D7”).
go_id (str) – GO term identifier (e.g. “GO:0004672”).
evidence (list[str] | None) – Evidence code filter. Defaults to ["Curated", "Computed"].
go_term_value (str | None) – Value for the go_term field. Defaults to go_id. GiardiaDB uses "N/A" here.

Return type:

dict[str, str]

veupath_chatbot.services.experiment.seed.helpers.text_search_params(organism, expression, *, fields=None)[source]¶

Build GenesByText search parameters.

Args:: organism: Organism full name. expression: Free-text query (e.g. “kinase”, “rhoptry”). fields: Fields to search. Defaults to ["product"].

Return type:: dict[str, str]

veupath_chatbot.services.experiment.seed.helpers.signal_peptide_params(organism)[source]¶

Build GenesWithSignalPeptide search parameters.

Return type:: dict[str, str]

veupath_chatbot.services.experiment.seed.helpers.transmembrane_params(organism, min_tm, max_tm)[source]¶

Build GenesByTransmembraneDomains search parameters.

Callers pass default min/max values appropriate to their site context.

Return type:: dict[str, str]

veupath_chatbot.services.experiment.seed.helpers.mol_weight_params(organism, min_mw, max_mw)[source]¶

Build GenesByMolecularWeight search parameters.

Return type:: dict[str, str]

veupath_chatbot.services.experiment.seed.helpers.ec_search_params(organism, *, ec_number, ec_sources, ec_wildcard='No')[source]¶

Build GenesByEcNumber search parameters.

Args:: organism: Organism full name. ec_number: EC number pattern (e.g. “2.7.11.1”). ec_sources: Evidence sources list (e.g. ["KEGG_Enzyme"]). ec_wildcard: Wildcard flag. Defaults to "No".

Return type:: dict[str, str]

veupath_chatbot.services.experiment.seed.helpers.gene_type_params(organism, gene_type='protein coding')[source]¶

Build GenesByGeneType search parameters.

Return type:: dict[str, str]

veupath_chatbot.services.experiment.seed.helpers.interpro_params(organism, database, typeahead)[source]¶

Build GenesByInterproDomain search parameters.

Return type:: dict[str, str]

veupath_chatbot.services.experiment.seed.helpers.location_params(organism, chromosome, start, end)[source]¶

Build GenesByLocation search parameters.

Return type:: dict[str, str]

veupath_chatbot.services.experiment.seed.helpers.exon_count_params(organism, min_exons, max_exons)[source]¶

Build GenesByExonCount search parameters.

Return type:: dict[str, str]

veupath_chatbot.services.experiment.seed.helpers.taxon_params(organism)[source]¶

Build GenesByTaxon search parameters.

Return type:: dict[str, str]

veupath_chatbot.services.experiment.seed.helpers.rnaseq_fc_params(*, dataset_url, profileset, direction, ref_samples, comp_samples, fold_change='2', hard_floor, protein_coding='yes', ref_op='average1', comp_op='average1')[source]¶

Build RNA-Seq fold-change search parameters.

Return type:: dict[str, str]

veupath_chatbot.services.experiment.seed.helpers.paralog_count_params(organism, min_p, max_p)[source]¶

Build GenesByParalogCount search parameters.

Return type:: dict[str, str]

Shared types for seed definitions.

class veupath_chatbot.services.experiment.seed.types.ControlSetDef(name: str, positive_ids: list[str], negative_ids: list[str], provenance_notes: str, tags: list[str] = <factory>)[source]¶

Bases: object

name: str¶

positive_ids: list[str]¶

negative_ids: list[str]¶

provenance_notes: str¶

tags: list[str]¶

__init__(name, positive_ids, negative_ids, provenance_notes, tags=<factory>)¶

class veupath_chatbot.services.experiment.seed.types.SeedDef(name: str, description: str, site_id: str, step_tree: dict[str, Any], control_set: ControlSetDef, record_type: str = 'transcript')[source]¶

Bases: object

name: str¶

description: str¶

site_id: str¶

step_tree: dict[str, Any]¶

control_set: ControlSetDef¶

record_type: str = 'transcript'¶

__init__(name, description, site_id, step_tree, control_set, record_type='transcript')¶