WDK Services

WDK integration services that bridge PathFinder’s domain model with VEuPathDB’s WDK REST API. These services handle enrichment orchestration, record type resolution, step result browsing, and shared WDK helpers.

Overview

The WDK services layer sits between the transport/tool layers and the raw integrations.veupathdb HTTP client. Where the integration layer handles HTTP communication, these services add business logic: enrichment orchestration, fuzzy record-type matching, result pagination, and attribute inspection.

Design Decisions

Why a separate WDK service layer? The integration layer (integrations.veupathdb) is a thin HTTP client. WDK services add domain logic that multiple consumers need: experiments, gene sets, workbench, and export all need step result browsing with consistent attribute handling. Sharing this logic avoids duplication.

Fuzzy record-type matching: WDK record type names vary between sites (GeneRecordClasses.GeneRecordClass vs gene). The resolver uses three-stage matching (exact → name → display) with disambiguation to handle this reliably.

Enrichment Service

Purpose: Unified enrichment orchestration. Runs GO/pathway enrichment via WDK, handles multiple enrichment types, and formats results for the experiment analysis pipeline.

Unified enrichment service.

Single entry point for running enrichment analyses regardless of whether the caller is an experiment endpoint, gene set endpoint, or AI tool.

Rate limiting

A process-level semaphore (_WDK_ENRICHMENT_SEMAPHORE) limits how many run_batch calls can execute concurrently across the entire application. Within a single batch, analyses run in parallel via asyncio.gather to keep total wall-clock time within proxy timeouts.

class veupath_chatbot.services.wdk.enrichment_service.EnrichmentService[source]

Bases: object

Unified enrichment dispatcher.

async run(*, site_id, analysis_type, step_id=None, search_name=None, record_type=None, parameters=None)[source]

Run a single enrichment analysis.

If step_id is provided, runs on the existing step. Otherwise creates a temporary strategy from search_name/parameters.

Return type:

EnrichmentResult

async run_batch(*, site_id, analysis_types, step_id=None, search_name=None, record_type=None, parameters=None)[source]

Run multiple enrichment analyses concurrently on a shared step.

When no step_id is provided (paste gene sets), creates ONE temporary WDK step/strategy and runs all analysis types against it — instead of creating N separate temp strategies. This reduces WDK API calls from ~5N to ~N+3 and avoids rate-limit 500s.

Return type:

tuple[list[EnrichmentResult], list[str]]

WDK Helpers

Purpose: Shared WDK helpers for record parsing, attribute inspection, and parameter merging. Used across experiments, gene sets, and workbench.

Shared WDK helpers for record parsing, attribute inspection, and param merging.

These functions are used by experiment results, gene set, and workbench endpoints to work with WDK record types, primary keys, and analysis parameters. Previously duplicated across multiple router modules.

veupath_chatbot.services.wdk.helpers.DETAIL_ATTRIBUTE_LIMIT = 50

Max attributes to request when fetching a single record detail.

WDK record types can have thousands of attributes (e.g. 3000+ expression columns on transcript). Requesting all would timeout. The first ~50 isInReport attributes cover core gene/record fields.

veupath_chatbot.services.wdk.helpers.is_sortable(attr_type)[source]

Return True if a WDK attribute type supports numeric sorting.

Return type:

bool

veupath_chatbot.services.wdk.helpers.is_suggested_score(name)[source]

Heuristic: flag well-known score attributes as suggested for ranking.

Return type:

bool

veupath_chatbot.services.wdk.helpers.extract_pk(record)[source]

Extract primary key string from a WDK record.

WDK records use "id": [{name, value}, ...] for the composite primary key. Returns the first part’s value, stripped.

Return type:

str | None

veupath_chatbot.services.wdk.helpers.extract_record_ids(records, *, preferred_key=None)[source]

Extract gene/record IDs from WDK standard report records.

If preferred_key is given, looks it up in each record’s attributes dict first; falls back to the primary-key array.

Accepts object so callers do not need to narrow the type before calling (e.g. answer.get("records") may return None).

Parameters:
  • records (object) – WDK answer records (expected list[dict]).

  • preferred_key (str | None) – Attribute name to prefer over primary key.

Returns:

List of non-empty record IDs.

Return type:

list[str]

veupath_chatbot.services.wdk.helpers.order_primary_key(pk_parts, pk_refs, pk_defaults)[source]

Reorder and fill primary key parts to match WDK record class definition.

WDK requires PK columns in the exact order defined by primaryKeyColumnRefs. Step reports may omit columns like project_id and may return them in a different order.

Parameters:
  • pk_parts (list[JSONObject]) – Client-provided PK parts ([{name, value}, ...]).

  • pk_refs (list[str]) – Column names in record-class order.

  • pk_defaults (dict[str, str]) – Default values for missing columns (e.g. project_id).

Returns:

Ordered PK parts matching pk_refs.

Return type:

list[JSONObject]

veupath_chatbot.services.wdk.helpers.build_attribute_list(attrs_raw)[source]

Build a normalized attribute list from WDK record type info.

Handles both dict (attributesMap) and list (expanded) formats. Each entry includes: name, displayName, help, type, isDisplayable, isSortable, isSuggested.

This consolidates the 40+ line if/elif blocks previously copy-pasted in both get_experiment_attributes and get_gene_set_attributes.

Parameters:

attrs_raw (object) – Raw attributes value from the record type info.

Returns:

Normalized attribute list.

Return type:

list[JSONObject]

veupath_chatbot.services.wdk.helpers.extract_detail_attributes(attrs_raw)[source]

Extract attribute names and display names for the record detail view.

Filters to attributes with isInReport=True (skipping composite overview fields) and caps at DETAIL_ATTRIBUTE_LIMIT so that record types with thousands of attributes don’t timeout WDK.

Handles both dict (attributesMap) and list (expanded) formats.

Returns:

(attribute_names, display_name_map)

Return type:

tuple[list[str], dict[str, str]]

veupath_chatbot.services.wdk.helpers.merge_analysis_params(form_meta, user_params)[source]

Merge WDK form defaults with user-supplied parameters.

Always extracts defaults from the WDK form metadata and layers user-supplied parameters on top so that required fields are never missing (which would cause WDK 422 errors).

After merging, vocabulary params (single-pick-vocabulary, multi-pick-vocabulary) are re-encoded as JSON arrays using the form metadata. This ensures that user-supplied plain strings don’t bypass the encoding required by AbstractEnumParam.convertToTerms().

Return type:

JSONObject

Record Type Resolution

Purpose: Resolve record type names with fuzzy matching. Three-stage matching (exact, name, display) handles WDK’s inconsistent naming across sites.

Shared record-type resolution utility.

Normalizes a user-supplied record type string and matches it against the available WDK record type objects. Three matching strategies are tried in order:

  1. Exact (case-insensitive) match on canonical name (urlSegment then name via wdk_entity_name()).

  2. Exact (case-insensitive) match on the name field of dict entries.

  3. Display name match — only accepted when exactly one record type has a matching displayName to avoid ambiguity.

If none of the strategies succeed the function returns None.

veupath_chatbot.services.wdk.record_types.resolve_record_type(available_types, user_input)[source]

Match user_input against WDK record-type objects.

Parameters:
  • available_types (list[JSONValue]) – Raw record type list from WDK (may contain plain strings or dicts with urlSegment/name/displayName).

  • user_input (str) – User-supplied record type string.

Returns:

The canonical (urlSegment / name) string for the matched record type, or None if no match is found.

Return type:

str | None

Step Results Service

Purpose: Shared service for browsing WDK step results. Encapsulates attribute listing, record retrieval, distribution computation, and analysis endpoint logic. Used by experiments, gene sets, and workbench endpoints.

Shared service for browsing WDK step results.

Used by both experiment and gene set endpoints to avoid duplicating attribute listing, record browsing, distribution, and analysis logic.

class veupath_chatbot.services.wdk.step_results.StepResultsService(api, *, step_id, record_type)[source]

Bases: object

Provides read-only access to WDK step results.

Encapsulates the shared logic for attributes, records, distributions, and analyses that both experiments and gene sets need.

__init__(api, *, step_id, record_type)[source]
async get_attributes()[source]

Get available attributes for the record type.

Return type:

JSONObject

async get_records(*, offset=0, limit=50, sort=None, direction='ASC', attributes=None)[source]

Get paginated result records.

Return type:

JSONObject

async get_distribution(attribute_name)[source]

Get distribution data for an attribute.

Return type:

JSONObject

async list_analysis_types()[source]

List available WDK step analysis types.

Return type:

JSONObject

async get_strategy(strategy_id)[source]

Get the WDK strategy tree.

Return type:

JSONObject

async run_analysis_raw(analysis_name, parameters)[source]

Run a WDK step analysis with merged defaults.

Returns (raw_result, merged_params) so callers can handle enrichment parsing and persistence as needed.

Return type:

tuple[JSONObject, JSONObject]

async run_analysis(analysis_name, parameters)[source]

Run a WDK step analysis, auto-parsing enrichment results.

Return type:

JSONObject

async get_record_detail(primary_key, site_id)[source]

Get a single record’s full details by primary key.

Fetches record type info to reorder PK parts and to extract a capped set of isInReport attribute names. WDK interprets "attributes": [] as “return zero attributes”, so we must always pass explicit names.

Return type:

JSONObject