Services¶
Core business logic for gene lookup, parameter optimization, control tests, and catalog access. Services are stateless and orchestrated by the chat layer.
Overview¶
Gene lookup — Resolve gene names/symbols to VEuPathDB IDs via site-search or the WDK stateless reporter. Used when the user mentions genes from literature.
Parameter optimization — Optimize search parameters against positive/negative control lists using Bayesian optimization (TPE), grid, or random search.
Control tests — Run temporary WDK strategies with known gene lists and compute precision, recall, F1. Used by optimization and validation.
Catalog — Get parameter specs, validate values, list sites/searches.
Strategy session — Load and merge strategy state with conversation messages.
Gene Lookup¶
Purpose: Resolve gene names and IDs via VEuPathDB site-search and the WDK stateless reporter. Used by the agent to validate gene references from literature or user input.
Key functions:
lookup_genes_by_text()— Search by free text (name, symbol, description)resolve_gene_ids()— Resolve a list of known IDs to full records via WDK
Gene record lookup service.
Provides two complementary lookup strategies:
Text search – uses VEuPathDB site-search (Solr) to find genes by name, symbol, product description, or any free text. Results are filtered to the
genedocument type so only gene records are returned.ID resolution – uses the WDK stateless standard reporter endpoint (
POST /record-types/{rt}/searches/{search}/reports/standard) to fetch metadata for a list of known gene IDs. Useful for validating IDs or retrieving product names / organisms for IDs obtained from literature.
Both approaches are read-only and do not create steps or strategies.
- async veupath_chatbot.services.gene_lookup.lookup_genes_by_text(site_id, query, *, organism=None, offset=0, limit=20)[source]¶
Search for gene records using multiple concurrent strategies.
- Parameters:
- Returns:
Dict with
results,totalCount, and optionalsuggestedOrganisms.- Return type:
- async veupath_chatbot.services.gene_lookup.resolve_gene_ids(site_id, gene_ids, *, record_type='transcript', search_name='GeneByLocusTag', param_name='ds_gene_ids', attributes=None)[source]¶
Resolve a list of gene IDs to full records via the WDK standard reporter.
Uses a dedicated short-lived WDK client to guarantee session affinity between dataset creation and the subsequent search. The shared singleton client’s cookie jar is modified by concurrent requests, which can cause the dataset to “not belong” to the search session (WDK tracks anonymous users via session cookies).
- Return type:
Main gene text lookup orchestration.
Uses four concurrent strategies to maximise recall, then scores, deduplicates, and ranks results by relevance:
Unrestricted site-search (Solr) – always fires.
Organism-restricted site-search – fires when the query implies an organism.
WDK
GenesByTextwildcard – fires when query looks like a gene ID prefix.WDK
GenesByTextbroad – fires when an explicit organism filter is given.
- async veupath_chatbot.services.gene_lookup.lookup.lookup_genes_by_text(site_id, query, *, organism=None, offset=0, limit=20)[source]¶
Search for gene records using multiple concurrent strategies.
- Parameters:
- Returns:
Dict with
results,totalCount, and optionalsuggestedOrganisms.- Return type:
Enrich sparse gene results with WDK metadata.
- async veupath_chatbot.services.gene_lookup.enrich.enrich_sparse_gene_results(site_id, results, limit)[source]¶
Enrich results that lack organism/product via WDK standard reporter.
Site-search only returns
summaryFieldDatafor fields where the query matched. When a gene matches in literature (e.g. MULTIgene_PubMed), organism/product are absent. We fetch full metadata from the WDK to fill the gaps.- Return type:
Organism fuzzy matching for gene lookup.
- veupath_chatbot.services.gene_lookup.organism.score_organism_match(query, organism)[source]¶
Score how well query matches organism (0.0 = no match, 1.0 = exact).
Handles exact match, substring, genus abbreviation (
P. falciparum), organism codes (pf3d7), and token-level overlap.- Return type:
- veupath_chatbot.services.gene_lookup.organism.suggest_organisms(query, available, *, max_suggestions=5, min_score=0.4)[source]¶
Return organism names from available that fuzzy-match query.
- Parameters:
- Returns:
Suggested organism names, best match first.
- Return type:
- veupath_chatbot.services.gene_lookup.organism.normalize_organism(raw)[source]¶
Clean organism string; handle JSON array format from site-search.
- Return type:
Gene result building for lookup responses.
- veupath_chatbot.services.gene_lookup.result.build_gene_result(*, gene_id, display_name='', organism='', product='', gene_name='', gene_type='', location='', previous_ids='', matched_fields=None)[source]¶
Build a standardised gene result dict.
All gene results – whether from site-search or WDK – are funnelled through this builder so the shape is always consistent.
- Return type:
Gene-specific relevance scoring for text search results.
- veupath_chatbot.services.gene_lookup.scoring.score_gene_relevance(query, result)[source]¶
Score a gene result’s relevance to query.
Higher is better. The score is an additive combination of how well the query matches the gene ID, gene name, organism, and product, plus a bonus/penalty based on which site-search fields matched.
An extra bonus is awarded when the query exactly matches a descriptive field (product, displayName) so that exact hits always rank above incidental fuzzy overlap from shared tokens like “alpha” or “2”.
- Return type:
Site-search gene fetching and document parsing.
- veupath_chatbot.services.gene_lookup.site_search.parse_site_search_docs(docs)[source]¶
Convert raw site-search documents into standardised gene dicts.
- Return type:
- async veupath_chatbot.services.gene_lookup.site_search.fetch_site_search_genes(site_id, search_text, *, organisms=None, limit=50)[source]¶
Run a single site-search query and return parsed results.
WDK-based gene search and ID resolution.
- class veupath_chatbot.services.gene_lookup.wdk.WdkTextResult(records, total_count)[source]¶
Bases:
objectResults from a WDK
GenesByTextquery.- records: list[JSONObject]¶
- __init__(records, total_count)¶
- async veupath_chatbot.services.gene_lookup.wdk.fetch_wdk_text_genes(site_id, expressions, *, organism=None, text_fields=None, record_type='transcript', limit=50)[source]¶
Search genes via WDK
GenesByText.- Return type:
- async veupath_chatbot.services.gene_lookup.wdk.resolve_gene_ids(site_id, gene_ids, *, record_type='transcript', search_name='GeneByLocusTag', param_name='ds_gene_ids', attributes=None)[source]¶
Resolve a list of gene IDs to full records via the WDK standard reporter.
Uses a dedicated short-lived WDK client to guarantee session affinity between dataset creation and the subsequent search. The shared singleton client’s cookie jar is modified by concurrent requests, which can cause the dataset to “not belong” to the search session (WDK tracks anonymous users via session cookies).
- Return type:
Parameter Optimization¶
Purpose: Optimize search parameters against positive/negative control gene lists using Bayesian optimization (TPE), grid search, or random search. Each trial runs a temporary WDK strategy and scores the result.
Key types: ParameterSpec, OptimizationConfig, OptimizationResult
Parameter optimization for VEuPathDB searches.
Re-exports the public API so existing from
veupath_chatbot.services.parameter_optimization import ... statements
continue to work unchanged.
- class veupath_chatbot.services.parameter_optimization.OptimizationConfig(budget: int = 30, objective: Literal['f1', 'f_beta', 'recall', 'precision', 'specificity', 'balanced_accuracy', 'mcc', 'youdens_j', 'custom'] = 'f1', beta: float = 1.0, recall_weight: float = 1.0, precision_weight: float = 1.0, method: Literal['bayesian', 'grid', 'random'] = 'bayesian', result_count_penalty: float = 0.0)[source]¶
Bases:
object- objective: Literal['f1', 'f_beta', 'recall', 'precision', 'specificity', 'balanced_accuracy', 'mcc', 'youdens_j', 'custom']¶
- result_count_penalty: float¶
Weight for penalising large result sets. The penalty is
result_count_penalty * (result_count / total_genes)where total_genes is the denominator (defaults to 20 000 if unknown). A small value (e.g. 0.1) acts as a tiebreaker; higher values make the optimiser strongly prefer tighter results.
- __init__(budget=30, objective='f1', beta=1.0, recall_weight=1.0, precision_weight=1.0, method='bayesian', result_count_penalty=0.0)¶
- class veupath_chatbot.services.parameter_optimization.OptimizationResult(optimization_id: str, best_trial: TrialResult | None, all_trials: list[TrialResult], pareto_frontier: list[TrialResult], sensitivity: dict[str, float], total_time_seconds: float, status: str, error_message: str | None = None)[source]¶
Bases:
object- best_trial: TrialResult | None¶
- all_trials: list[TrialResult]¶
- pareto_frontier: list[TrialResult]¶
- __init__(optimization_id, best_trial, all_trials, pareto_frontier, sensitivity, total_time_seconds, status, error_message=None)¶
- class veupath_chatbot.services.parameter_optimization.ParameterSpec(name, param_type, min_value=None, max_value=None, log_scale=False, step=None, choices=None)[source]¶
Bases:
objectDescribes a single parameter to optimise.
- __init__(name, param_type, min_value=None, max_value=None, log_scale=False, step=None, choices=None)¶
- class veupath_chatbot.services.parameter_optimization.TrialResult(trial_number: int, parameters: dict[str, JSONValue], score: float, recall: float | None, false_positive_rate: float | None, result_count: int | None, positive_hits: int | None = None, negative_hits: int | None = None, total_positives: int | None = None, total_negatives: int | None = None)[source]¶
Bases:
object- __init__(trial_number, parameters, score, recall, false_positive_rate, result_count, positive_hits=None, negative_hits=None, total_positives=None, total_negatives=None)¶
- async veupath_chatbot.services.parameter_optimization.optimize_search_parameters(*, site_id, record_type, search_name, fixed_parameters, parameter_space, controls_search_name, controls_param_name, positive_controls=None, negative_controls=None, controls_value_format='newline', controls_extra_parameters=None, id_field=None, config=None, progress_callback=None, check_cancelled=None)[source]¶
Run parameter optimisation against positive/negative controls.
Returns an
OptimizationResultwith the best configuration, all trials, Pareto frontier, and sensitivity analysis.- Return type:
Configuration types for parameter optimization.
Defines the parameter specification, optimization config, trial result, and optimization result dataclasses, as well as type aliases for callbacks.
Parameter optimization for VEuPathDB searches.
Optimizes search parameters against positive/negative control gene lists
using Bayesian optimization (TPE sampler via optuna), grid search, or random
search. Each “trial” runs a temporary WDK strategy via
run_positive_negative_controls() and scores the result.
- async veupath_chatbot.services.parameter_optimization.core.optimize_search_parameters(*, site_id, record_type, search_name, fixed_parameters, parameter_space, controls_search_name, controls_param_name, positive_controls=None, negative_controls=None, controls_value_format='newline', controls_extra_parameters=None, id_field=None, config=None, progress_callback=None, check_cancelled=None)[source]¶
Run parameter optimisation against positive/negative controls.
Returns an
OptimizationResultwith the best configuration, all trials, Pareto frontier, and sensitivity analysis.- Return type:
Scoring, analysis, and serialization helpers for parameter optimization.
- veupath_chatbot.services.parameter_optimization.scoring.result_to_json(result)[source]¶
- Return type:
Trial execution loop for parameter optimization.
- class veupath_chatbot.services.parameter_optimization.trials.TrialMetrics(recall, fpr, result_count, positive_hits, negative_hits)[source]¶
Bases:
objectIntermediate metrics extracted from a WDK result.
- __init__(recall, fpr, result_count, positive_hits, negative_hits)¶
- class veupath_chatbot.services.parameter_optimization.trials.EarlyStopReason(*values)[source]¶
Bases:
EnumWhy the optimisation loop stopped early.
- PERFECT_SCORE = 'perfect_score'¶
- PLATEAU = 'plateau'¶
Catalog (Parameter Validation)¶
Purpose: Validation of search parameter values. Normalizes, canonicalizes, and validates parameter values against WDK search specs before step creation or strategy execution.
Validation of search parameter values.
- async veupath_chatbot.services.catalog.param_validation.validate_search_params(*, site_id, record_type, search_name, context_values)[source]¶
Validate and canonicalize search parameters for UI consumption.
- Returns a stable payload:
{ “validation”: { “isValid”: bool, “normalizedContextValues”: {…}, “errors”: {…} } }
The goal is to keep the frontend a consumer of backend normalization + validation, without requiring the UI to interpret raw WDK payloads.
- Return type:
- async veupath_chatbot.services.catalog.param_validation.validate_parameters(*, site_id, record_type, search_name, parameters, resolve_record_type_for_search, find_record_type_hint, extract_vocab_options)[source]¶
Validate parameters against WDK search specs.
Normalizes parameters in-place and raises
ValidationErrorwhen the search is unknown, extra/unknown parameters are provided, or required parameters are missing.
Export Service¶
Purpose: CSV/TSV/TXT generation and Redis temporary storage for data exports. Generates downloadable files from strategy results, gene sets, and enrichment results, storing them briefly in Redis for client retrieval.
Export service — CSV/TSV/TXT generation + Redis temp storage.
- class veupath_chatbot.services.export.service.ExportResult(export_id, filename, content_type, url, size_bytes, expires_in_seconds)[source]¶
Bases:
objectMetadata returned after generating an export file.
- __init__(export_id, filename, content_type, url, size_bytes, expires_in_seconds)¶
- class veupath_chatbot.services.export.service.ExportService(redis)[source]¶
Bases:
objectGenerates downloadable files and stores them in Redis with TTL.
- async get_export(export_id)[source]¶
Retrieve stored export. Returns (content, filename, content_type) or None.
- async export_enrichment_json(results, name)[source]¶
Export enrichment results as JSON.
- Return type:
SSE progress callbacks for parameter optimization.
- async veupath_chatbot.services.parameter_optimization.callbacks.emit_started(callback, *, optimization_id, search_name, record_type, budget, objective, positive_controls_count, negative_controls_count, param_space_json)[source]¶
- async veupath_chatbot.services.parameter_optimization.callbacks.emit_trial_progress(callback, *, optimization_id, trial_num, budget, trial_json, best_trial, recent_trials)[source]¶
Control Tests¶
Purpose: Run positive/negative control gene lists against a WDK strategy and compute precision, recall, F1, and related metrics. Used by parameter optimization and for validation.
Key function: run_positive_negative_controls()
Positive/negative control test helpers for planning mode.
These helpers run temporary WDK steps/strategies to evaluate whether known positive controls are returned and known negative controls are excluded.
- async veupath_chatbot.services.control_tests.resolve_controls_param_type(api, record_type, controls_search_name, controls_param_name)[source]¶
Return the WDK param type for a controls parameter.
- Parameters:
api (StrategyAPI) – Strategy API instance.
record_type (str) – WDK record type.
controls_search_name (str) – Name of the controls search.
controls_param_name (str) – Parameter name within the controls search.
- Returns:
Parameter type string (e.g.
"input-dataset") or None.- Return type:
str | None
- async veupath_chatbot.services.control_tests.run_positive_negative_controls(*, site_id, record_type, target_search_name, target_parameters, controls_search_name, controls_param_name, positive_controls=None, negative_controls=None, controls_value_format='newline', controls_extra_parameters=None, id_field=None, skip_cleanup=False)[source]¶
Run positive + negative controls against a single WDK question configuration.
Each control set (positive / negative) creates its own target step internally. WDK cascade-deletes all steps inside a strategy when the strategy is deleted, so a shared target step would be invalidated after the first control run’s cleanup.
- Parameters:
skip_cleanup (bool) – When
True, skip the upfront strategy cleanup. Useful when the caller already performed cleanup (e.g. batch sweeps).- Return type:
Control Helpers¶
Purpose: Formatting and parsing utilities for control test evaluation. Encodes gene ID lists in various formats (newline, comma, JSON) and handles temporary strategy cleanup.
Formatting and parsing utilities for control-test evaluation.
Search Reranking¶
Purpose: Reusable “fetch wide, rerank narrow” pattern for search results. Robust fuzzy matching with exactness bonuses for gene ID lookups. Used to improve relevance of WDK search results.
Reusable search result reranking utilities.
Implements a “fetch wide, rerank narrow” pattern for VEuPathDB search:
Analyse the query to detect intent (gene ID prefix, organism abbreviation, free text, etc.)
Fetch broadly from one or more sources (site-search, WDK).
Score each result on multiple relevance signals.
Deduplicate by primary key, keeping the highest-scored entry.
Return the top-N results sorted by combined score.
- veupath_chatbot.services.search_rerank.score_text_match(query, value)[source]¶
Score how well query matches value (0.0–1.0).
Uses
rapidfuzzfor robust fuzzy matching, with bonuses for exact and prefix matches that are critical for gene ID lookups.- Return type:
- veupath_chatbot.services.search_rerank.score_field_quality(matched_fields)[source]¶
Score based on which fields the query matched in.
- Return type:
- class veupath_chatbot.services.search_rerank.ScoredResult(result, score, source='')[source]¶
Bases:
objectA search result with an attached relevance score.
- result: JSONObject¶
- __init__(result, score, source='')¶
- veupath_chatbot.services.search_rerank.dedup_and_sort(results, key_fn)[source]¶
Deduplicate results by key, keeping the highest-scoring entry.
- Return type:
- class veupath_chatbot.services.search_rerank.QueryIntent(raw, is_gene_id_like=False, implied_organism=None, implied_organism_score=0.0, wildcard_ids=())[source]¶
Bases:
objectWhat we think the user is looking for.
- __init__(raw, is_gene_id_like=False, implied_organism=None, implied_organism_score=0.0, wildcard_ids=())¶
- veupath_chatbot.services.search_rerank.analyse_query(query, available_organisms, organism_scorer=None)[source]¶
Analyse a query string to detect search intent.
- Parameters:
- Returns:
A
QueryIntentdescribing what the user likely wants.- Return type:
Catalog (Parameters & Searches)¶
Purpose: Retrieve and validate search parameters from VEuPathDB. Handles parameter specs, dependent vocabularies, and search details. Used by tools when the agent needs to discover or validate parameters.
Key functions:
get_search_parameters()— Full parameter specs for a searchvalidate_search_params()— Validate parameter valuesget_refreshed_dependent_params()— Refresh dependent parameter options
Search parameter retrieval, validation, and expansion functions.
- async veupath_chatbot.services.catalog.parameters.expand_search_details_with_params(site_id, record_type, search_name, context_values)[source]¶
Return WDK search details after applying (WDK-wire) context values.
NOTE: despite the historical name, this is not a pure validation API; it returns WDK search details payload. Keep it separate from the public validation endpoint.
- Return type:
- async veupath_chatbot.services.catalog.parameters.get_refreshed_dependent_params(*, site_id, record_type, search_name, parameter_name, context_values)[source]¶
Get refreshed dependent parameter vocabulary, falling back to the portal.
Tries the site-specific WDK client first. If that fails with a
WDKErrorand the site is not alreadyveupathdb, retries against the portal client (veupathdb).- Parameters:
site_id (str) – Site identifier.
record_type (str) – WDK record type.
search_name (str) – WDK search name.
parameter_name (str) – The dependent parameter to refresh.
context_values (JSONObject) – Current context parameter values.
- Returns:
Refreshed dependent param payload from WDK.
- Return type:
- async veupath_chatbot.services.catalog.parameters.get_search_parameters(site_id, record_type, search_name)[source]¶
Get detailed parameter info for a specific search.
This is intentionally defensive: WDK responses can vary by site/endpoint.
- Return type:
- async veupath_chatbot.services.catalog.parameters.get_search_parameters_tool(site_id, record_type, search_name)[source]¶
Tool-friendly wrapper that returns standardized tool_error payloads.
- Return type:
- async veupath_chatbot.services.catalog.parameters.lookup_phyletic_codes(site_id, record_type, query)[source]¶
Search phyletic species codes by name for the GenesByOrthologPattern search.
Returns matching
{code, label}pairs from thephyletic_term_mapvocabulary. The model uses codes to buildprofile_patternvalues.- Parameters:
- Returns:
Dict with
matcheslist andqueryecho.- Return type:
- async veupath_chatbot.services.catalog.parameters.validate_search_params(*, site_id, record_type, search_name, context_values)[source]¶
Validate and canonicalize search parameters for UI consumption.
- Returns a stable payload:
{ “validation”: { “isValid”: bool, “normalizedContextValues”: {…}, “errors”: {…} } }
The goal is to keep the frontend a consumer of backend normalization + validation, without requiring the UI to interpret raw WDK payloads.
- Return type:
Catalog (Sites & Record Types)¶
Purpose: Sites, record types, and search listing. Entry point for discovery.
Sites and record types catalog functions.
- async veupath_chatbot.services.catalog.sites.list_sites()[source]¶
List all available VEuPathDB sites.
- Return type:
- async veupath_chatbot.services.catalog.sites.get_record_types(site_id)[source]¶
Get record types for a specific site.
- Return type:
Search listing and searching functions.
- veupath_chatbot.services.catalog.searches.score_search(*, query_terms, keywords, search_name, display_name, description, corpus_doc_count=1, corpus_term_counts=None)[source]¶
Score a search against query terms and keywords.
Keywords matched against searchName via substring →
+KEYWORD_BOOSTeach.Query terms matched per field with field weight × IDF.
Short terms (<
_MIN_TERM_LENchars) ignored in query matching.
- Return type:
- veupath_chatbot.services.catalog.searches.is_chooser_search(search)[source]¶
Return True if this is a routing/chooser search (no real params).
Chooser searches have
websiteProperties: ["hideOperation"]and/or emptyparamNames. The search list endpoint returnsparamNames(list of strings), not fullparametersobjects.- Return type:
- veupath_chatbot.services.catalog.searches.annotate_search(search)[source]¶
Add
categoryandreturnsfields to a search result dict.
- async veupath_chatbot.services.catalog.searches.get_raw_record_types(site_id)[source]¶
Return raw WDK record type objects for a site.
Unlike
services.catalog.sites.get_record_types(), this preserves the full WDK payloads (urlSegment,name,displayName, etc.) so that callers needing the original structure don’t have to go through the integrations layer directly.- Return type:
- async veupath_chatbot.services.catalog.searches.get_raw_searches(site_id, record_type)[source]¶
Return raw WDK search objects for a record type.
Thin service-level wrapper over the discovery integration so that AI tools and other service consumers never import from
integrations/directly.- Return type:
- async veupath_chatbot.services.catalog.searches.list_searches(site_id, record_type)[source]¶
List searches for a specific record type.
Returns name + displayName only to keep the payload small (VEuPathDB has 2000+ searches; descriptions alone add ~3 MB). The model should use
search_for_searchesfor targeted discovery with descriptions, orget_search_parametersfor full details on a specific search.
- async veupath_chatbot.services.catalog.searches.list_transforms(site_id, record_type)[source]¶
List transform/combine searches (with descriptions).
Returns only searches that accept an input step — these are used to chain steps together (ortholog transform, weight filter, span logic, boolean combine, etc.). Typically 5-7 per site, so descriptions are included.
- async veupath_chatbot.services.catalog.searches.search_for_searches(site_id, record_type, query, *, keywords=None, limit=20)[source]¶
Find searches matching a query and/or keywords.
Uses field-weighted scoring with IDF, keyword boosting against search names, chooser filtering, and result annotation. Site-search results are merged in when available.
- async veupath_chatbot.services.catalog.searches.find_record_type_for_search(site_id, record_type, search_name)[source]¶
Resolve which record type actually contains a search name.
Uses the pre-cached SearchCatalog (mirrors WDK’s global
getQuestionByName()lookup) — no HTTP calls at resolve time. Falls back to record_type when the search isn’t found.- Return type:
- async veupath_chatbot.services.catalog.searches.make_record_type_resolver(site_id)[source]¶
Create a record type resolver backed by the pre-cached SearchCatalog.
Mirrors WDK’s
WdkModel.getQuestionByName()— a global lookup that finds which record type owns a given search name, using the already-cached catalog data (no HTTP calls at resolve time).
- async veupath_chatbot.services.catalog.searches.resolve_record_type_from_steps(root_step, resolver)[source]¶
Resolve record type from the first resolvable leaf search in a step tree.
Uses
collect_plan_leaves()to find leaf (search) nodes, then calls the resolver to find the owning record type for the first one that resolves.- Return type:
str | None
Catalog (Parameter Resolution)¶
Purpose: WDK parameter fetching, caching, and vocabulary expansion. Resolves search parameter specs with allowed values, handles dependent vocabularies, and flattens nested parameter structures for agent consumption.
WDK parameter fetching, caching, and expansion.
- async veupath_chatbot.services.catalog.param_resolution.get_search_parameters(site_id, record_type, search_name)[source]¶
Get detailed parameter info for a specific search.
This is intentionally defensive: WDK responses can vary by site/endpoint.
- Return type:
- async veupath_chatbot.services.catalog.param_resolution.get_search_parameters_tool(site_id, record_type, search_name)[source]¶
Tool-friendly wrapper that returns standardized tool_error payloads.
- Return type:
- async veupath_chatbot.services.catalog.param_resolution.lookup_phyletic_codes(site_id, record_type, query)[source]¶
Search phyletic species codes by name for the GenesByOrthologPattern search.
Returns matching
{code, label}pairs from thephyletic_term_mapvocabulary. The model uses codes to buildprofile_patternvalues.- Parameters:
- Returns:
Dict with
matcheslist andqueryecho.- Return type:
- async veupath_chatbot.services.catalog.param_resolution.expand_search_details_with_params(site_id, record_type, search_name, context_values)[source]¶
Return WDK search details after applying (WDK-wire) context values.
NOTE: despite the historical name, this is not a pure validation API; it returns WDK search details payload. Keep it separate from the public validation endpoint.
- Return type:
- async veupath_chatbot.services.catalog.param_resolution.get_refreshed_dependent_params(*, site_id, record_type, search_name, parameter_name, context_values)[source]¶
Get refreshed dependent parameter vocabulary, falling back to the portal.
Tries the site-specific WDK client first. If that fails with a
WDKErrorand the site is not alreadyveupathdb, retries against the portal client (veupathdb).- Parameters:
site_id (str) – Site identifier.
record_type (str) – WDK record type.
search_name (str) – WDK search name.
parameter_name (str) – The dependent parameter to refresh.
context_values (JSONObject) – Current context parameter values.
- Returns:
Refreshed dependent param payload from WDK.
- Return type:
Catalog (RAG Search)¶
Purpose: RAG search orchestration — embed query → search Qdrant → threshold → prune results. Centralizes the shared RAG pipeline used by catalog and example-plan tools.
RAG search service: embed -> query Qdrant -> threshold -> prune.
Centralises the shared pattern used by catalog and example-plan RAG tools so that the AI tool layer never touches integrations directly.
- class veupath_chatbot.services.catalog.rag_search.RagSearchService(*, site_id, store=None)[source]¶
Bases:
objectStateless service encapsulating all Qdrant-backed lookups.
Constructed with a site_id; owns its own
QdrantStoreinstance.- async search_record_types(query=None, limit=20, min_score=0.4)[source]¶
Semantic search over WDK record types.
- Return type:
- async get_record_type_details(record_type_id)[source]¶
Retrieve one record-type payload from Qdrant by id.
- Return type:
JSONObject | None
- async search_for_searches(query, record_type=None, limit=20, min_score=0.4)[source]¶
Semantic search over WDK searches.
- Return type:
- async get_search_metadata(record_type, search_name)[source]¶
Retrieve one search payload from Qdrant by composite key.
- Return type:
JSONObject | None
- async get_dependent_vocab(record_type, search_name, param_name, context_values=None)[source]¶
Fetch dependent vocab (Qdrant-cached, WDK fallback on miss).
- Return type:
- async search_example_plans(query, limit=5)[source]¶
Semantic search over ingested public strategies.
- Return type:
Strategy Session¶
Purpose: Load and merge strategy state with conversation messages. Used when switching strategies or restoring sessions.
Stateful strategy session types (in-memory).
These types model the working state while a user (or an AI agent) is building a VEuPathDB strategy during a chat session.
- class veupath_chatbot.domain.strategy.session.StrategyGraph(graph_id, name, site_id)[source]
Bases:
objectState for a single strategy graph.
- __init__(graph_id, name, site_id)[source]
- invalidate_build()[source]
Clear WDK build state so stale counts are not shown.
Call after any mutation that changes step semantics (parameters, search_name, operator, delete). The next
build_strategycall will re-populatestep_countsandwdk_step_ids.
- add_step(step)[source]
Add a step and maintain the subtree-root set.
The new step becomes a root. If it consumes existing roots as
primary_inputorsecondary_input, those are removed from the root set (they are now internal nodes of the new step’s subtree).- Parameters:
step (PlanStepNode) – Step to add.
- Returns:
Step ID.
- Return type:
- get_step(step_id)[source]
Get a step by ID.
- Parameters:
step_id (str) – Step ID.
- Returns:
Step or None.
- Return type:
PlanStepNode | None
- recompute_roots()[source]
Recompute
rootsfrom the currentstepsdict.A root is any step that is not referenced as the
primary_inputorsecondary_inputof another step. Call this after bulk mutations (delete, hydration) where incremental root tracking is impractical.
- class veupath_chatbot.domain.strategy.session.StrategySession(site_id)[source]
Bases:
objectSession context for the active strategy (graph + chat).
- __init__(site_id)[source]
- add_graph(graph)[source]
Register an existing graph in the session.
- Parameters:
graph (StrategyGraph) – Strategy graph to register.
- create_graph(name, graph_id=None)[source]
Create a new empty graph and register it.
- Parameters:
- Returns:
The graph.
- Return type:
- get_graph(graph_id)[source]
Get graph by ID (or active graph if None).
- Parameters:
graph_id (str | None) – Graph ID, or None for active graph.
- Returns:
Graph or None.
- Return type:
StrategyGraph | None
- veupath_chatbot.domain.strategy.session.hydrate_graph_from_steps_data(graph, steps_data, *, root_step_id=None, record_type=None)[source]
Hydrate an in-memory graph from persisted flat steps.
This is used when we have a persisted steps list (and maybe root_step_id) but no canonical plan to parse into an AST. It enables tools like list_current_steps to reflect existing UI-visible nodes.
Accepts arbitrary input; non-list values are silently ignored.
- Parameters:
graph (StrategyGraph) – Strategy graph to hydrate.
steps_data (JSONArray | object) – Flat steps list from persistence (or any value).
root_step_id (str | None) – Root step ID (default: None).
record_type (str | None) – Record type (default: None).
Experiment Seed Data¶
Purpose: Generate demo experiments with pre-built multi-step strategies and
control sets across 13 VEuPathDB databases. Seeds use multi-step mode
internally to create strategy trees (the only place multi-step mode is used).
Triggered via POST /api/v1/experiments/seed or the Settings > Seeding UI.
Each database has curated seed definitions with organism-specific searches, known positive/negative gene controls, and step trees that demonstrate real research workflows (e.g. drug resistance genes in PlasmoDB, virulence factors in TriTrypDB).
Seed strategy and control-set definitions.
Re-exports the public API so existing from
veupath_chatbot.services.experiment.seed import ... statements
continue to work unchanged.
- class veupath_chatbot.services.experiment.seed.ControlSetDef(name: str, positive_ids: list[str], negative_ids: list[str], provenance_notes: str, tags: list[str] = <factory>)[source]¶
Bases:
object- __init__(name, positive_ids, negative_ids, provenance_notes, tags=<factory>)¶
- class veupath_chatbot.services.experiment.seed.SeedDef(name: str, description: str, site_id: str, step_tree: dict[str, Any], control_set: ControlSetDef, record_type: str = 'transcript')[source]¶
Bases:
object- control_set: ControlSetDef¶
- __init__(name, description, site_id, step_tree, control_set, record_type='transcript')¶
- veupath_chatbot.services.experiment.seed.get_all_seeds()[source]¶
Get seeds for all available sites.
- veupath_chatbot.services.experiment.seed.get_seeds_for_site(site_id)[source]¶
Import and return SEEDS for a specific site.
- async veupath_chatbot.services.experiment.seed.run_seed(*, user_id, stream_repo, control_set_repo, site_id=None)[source]¶
Create seed strategies and control sets, yielding SSE progress events.
Seeds run concurrently (up to
_MAX_CONCURRENT_SEEDSat a time). Progress events are streamed back via anasyncio.Queue.If site_id is provided, only seeds for that database are created. Otherwise all available seeds are used.
Imports are deferred to avoid circular dependencies.
- Return type:
Seed strategy runner.
Creates real WDK strategies (visible in the sidebar) and curated control sets (available in the Experiments tab) across multiple VEuPathDB sites.
Seeds are processed concurrently across sites using asyncio.TaskGroup,
with a semaphore to cap the number of parallel WDK requests.
- async veupath_chatbot.services.experiment.seed.runner.run_seed(*, user_id, stream_repo, control_set_repo, site_id=None)[source]¶
Create seed strategies and control sets, yielding SSE progress events.
Seeds run concurrently (up to
_MAX_CONCURRENT_SEEDSat a time). Progress events are streamed back via anasyncio.Queue.If site_id is provided, only seeds for that database are created. Otherwise all available seeds are used.
Imports are deferred to avoid circular dependencies.
- Return type:
Shared parameter-building helpers for seed definitions.
Every VEuPathDB component site seed file needs to build WDK search parameter dicts. These helpers encode the common patterns — organism JSON encoding, GO term searches, text searches, signal peptide, transmembrane domains, etc.
Each seed file may still define site-specific helpers locally.
- veupath_chatbot.services.experiment.seed.helpers.org(names)[source]¶
Encode an organism name list as a WDK JSON-array string.
- Return type:
- veupath_chatbot.services.experiment.seed.helpers.go_search_params(organism, go_id, *, evidence=None, go_term_value=None)[source]¶
Build GenesByGoTerm search parameters.
- Parameters:
organism (str) – Organism full name (e.g. “Plasmodium falciparum 3D7”).
go_id (str) – GO term identifier (e.g. “GO:0004672”).
evidence (list[str] | None) – Evidence code filter. Defaults to
["Curated", "Computed"].go_term_value (str | None) – Value for the
go_termfield. Defaults to go_id. GiardiaDB uses"N/A"here.
- Return type:
- veupath_chatbot.services.experiment.seed.helpers.text_search_params(organism, expression, *, fields=None)[source]¶
Build GenesByText search parameters.
- Args:
organism: Organism full name. expression: Free-text query (e.g. “kinase”, “rhoptry”). fields: Fields to search. Defaults to
["product"].
- veupath_chatbot.services.experiment.seed.helpers.signal_peptide_params(organism)[source]¶
Build GenesWithSignalPeptide search parameters.
- veupath_chatbot.services.experiment.seed.helpers.transmembrane_params(organism, min_tm, max_tm)[source]¶
Build GenesByTransmembraneDomains search parameters.
Callers pass default min/max values appropriate to their site context.
- veupath_chatbot.services.experiment.seed.helpers.mol_weight_params(organism, min_mw, max_mw)[source]¶
Build GenesByMolecularWeight search parameters.
- veupath_chatbot.services.experiment.seed.helpers.ec_search_params(organism, *, ec_number, ec_sources, ec_wildcard='No')[source]¶
Build GenesByEcNumber search parameters.
- Args:
organism: Organism full name. ec_number: EC number pattern (e.g. “2.7.11.1”). ec_sources: Evidence sources list (e.g.
["KEGG_Enzyme"]). ec_wildcard: Wildcard flag. Defaults to"No".
- veupath_chatbot.services.experiment.seed.helpers.gene_type_params(organism, gene_type='protein coding')[source]¶
Build GenesByGeneType search parameters.
- veupath_chatbot.services.experiment.seed.helpers.interpro_params(organism, database, typeahead)[source]¶
Build GenesByInterproDomain search parameters.
- veupath_chatbot.services.experiment.seed.helpers.location_params(organism, chromosome, start, end)[source]¶
Build GenesByLocation search parameters.
- veupath_chatbot.services.experiment.seed.helpers.exon_count_params(organism, min_exons, max_exons)[source]¶
Build GenesByExonCount search parameters.
- veupath_chatbot.services.experiment.seed.helpers.taxon_params(organism)[source]¶
Build GenesByTaxon search parameters.
- veupath_chatbot.services.experiment.seed.helpers.rnaseq_fc_params(*, dataset_url, profileset, direction, ref_samples, comp_samples, fold_change='2', hard_floor, protein_coding='yes', ref_op='average1', comp_op='average1')[source]¶
Build RNA-Seq fold-change search parameters.
- veupath_chatbot.services.experiment.seed.helpers.paralog_count_params(organism, min_p, max_p)[source]¶
Build GenesByParalogCount search parameters.
Shared types for seed definitions.
- class veupath_chatbot.services.experiment.seed.types.ControlSetDef(name: str, positive_ids: list[str], negative_ids: list[str], provenance_notes: str, tags: list[str] = <factory>)[source]¶
Bases:
object- __init__(name, positive_ids, negative_ids, provenance_notes, tags=<factory>)¶
- class veupath_chatbot.services.experiment.seed.types.SeedDef(name: str, description: str, site_id: str, step_tree: dict[str, Any], control_set: ControlSetDef, record_type: str = 'transcript')[source]¶
Bases:
object- control_set: ControlSetDef¶
- __init__(name, description, site_id, step_tree, control_set, record_type='transcript')¶