Consolidate Prompt Schemas For Data Consistency
Consolidate Prompt Schemas to Achieve a Single Source of Truth
Refactoring prompt schemas is a crucial step towards achieving a more maintainable, consistent, and reliable system. The current architecture, characterized by scattered schema definitions across multiple locations, leads to several challenges. These challenges include schema drift, manual duplication, inconsistent naming conventions, and discrepancies between SDK and API schemas. This article will provide an in-depth analysis of the problems, propose an architectural solution, and outline a detailed migration plan. This will also describe the core concept of single source of truth and its benefits. It will also touch upon the implementation of domain schema, API schemas, and the crucial integration with DSL (Domain-Specific Language).
The Problem: A Scattered Landscape of Prompt Schemas
The current state of the prompt configuration schemas is fragmented, with definitions spread across numerous locations. This lack of a single source of truth leads to the following issues:
- Schema Drift: Inconsistencies arise when schemas are updated in one location but not others, leading to bugs and unexpected behavior. An example of this is the recent bug involving inputs using
.min(1)versus.default([]). - Manual Duplication: Redundant definitions and the need to manually copy and paste schemas increase the risk of errors and make maintenance more complex.
- Inconsistent Naming Conventions: The use of both snake case and camel case across different parts of the codebase requires ad-hoc transformations, increasing complexity.
- SDK Divergence: SDK schemas often lag behind or diverge from the API schemas, leading to compatibility issues and frustration for developers.
This fragmented approach creates a maintenance nightmare and hinders the system's overall reliability. The key to fixing these issues is to consolidate the prompt schemas and designate a single source of truth.
Current State: A Detailed Look at Schema Locations and Specific Issues
The existing architecture involves several locations where prompt schemas are defined. These locations, each serving a different purpose, contribute to the overall complexity. Here's a breakdown:
prompts/schemas/field-schemas.ts: Defines atomic field schemas. This is a good example of reusability.server/prompt-config/repositories/llm-config-version-schema.ts: Defines storage schema, which uses a mixed approach with snake case and versioning.app/api/prompts/[[...route]]/schemas/outputs.ts: Defines the API response. It now derives from the storage schema.app/api/prompts/[[...route]]/schemas/inputs.ts: Defines API input. It is currently manual and uses field schemas.prompts/schemas/form-schema.ts: Defines the UI form, which derives from storage.server/api/routers/prompts/prompts.trpc-router.ts: Contains tRPC input, which currently uses manual inline schemas.optimization_studio/types/dsl.ts: Defines DSL types, creating a parallel type system (Field,LlmPromptConfigComponent).prompts/utils/llmPromptConfigUtils.ts: Contains converters that perform manual mapping between form and DSL.typescript-sdk/.../schema/prompt.schema.ts: Defines SDK types, which duplicates everything.
Specific Issues:
- Naming Inconsistency: This is a common issue where storage uses snake case (e.g.,
max_tokens) while the API uses camel case (e.g.,maxTokens). This requires ad-hoc transformations throughout the codebase. - tRPC Router Duplication: The
prompts.trpc-router.tsmanually defines schemas, which should be derived from a shared input schema. - TypeScript SDK Duplication: The SDK duplicates schemas defined elsewhere, making it prone to inconsistencies.
- Service Type Manually Defined: The
prompt.service.tsmanually defines aVersionedPrompttype with many fields, which should be inferred from a schema. - Form Schema Different Structure: The
form-schema.tshas an extra wrapper (configData) not present in storage or the API. - Optimization Studio DSL - Parallel Type System: The DSL has a separate type system that must be kept in sync with prompt schemas via manual converters.
- Manual Converters Between Systems: The
llmPromptConfigUtils.tscontains manual mapping code (e.g.,promptConfigFormValuesToOptimizationStudioNodeData), which can easily break if either side changes.
These issues highlight the need for a more structured and unified approach. The goal is to move towards a system where a single source of truth drives all schema definitions.
Proposed Architecture: A Layered Approach to Schema Management
The proposed architecture introduces a layered approach to schema management, with the domain schema at its core. This design promotes a single source of truth, reduces duplication, and improves consistency.
Layer 1: Domain Schema (Single Source of Truth)
This layer defines the core prompt configuration schema. It is the single source of truth for all prompt-related data. The schema uses camel case for consistency. The key files in this layer include:
langwatch/src/prompts/schemas/domain/config-data.schema.ts: Defines the core prompt configuration data.langwatch/src/prompts/schemas/domain/version-metadata.schema.ts: Contains version tracking fields.langwatch/src/prompts/schemas/domain/prompt-config.schema.ts: Defines the full versioned prompt.langwatch/src/prompts/schemas/domain/index.ts: Exports the schemas.
// config-data.schema.ts
export const configDataSchema = z.object({
prompt: z.string(),
messages: z.array(messageSchema).default([]),
inputs: z.array(inputSchema).default([]),
outputs: z.array(outputSchema).min(1),
model: z.string().min(1),
temperature: z.number().optional(),
maxTokens: z.number().optional(),
demonstrations: nodeDatasetSchema.optional(),
promptingTechnique: promptingTechniqueSchema.optional(),
responseFormat: responseFormatSchema.optional(),
});
export type ConfigData = z.infer<typeof configDataSchema>;
Layer 2: Boundary Schemas (Derived)
This layer derives schemas from the domain schema, tailoring them for specific use cases such as storage, API input/output, and UI forms. Key files include:
langwatch/src/prompts/schemas/boundaries/storage.schema.ts: Transforms the domain schema to snake case for database storage.langwatch/src/prompts/schemas/boundaries/api-input.schema.ts: Defines the API input schema, using.partial()to make fields optional for create/update operations.langwatch/src/prompts/schemas/boundaries/api-output.schema.ts: Defines the API response schema, including metadata.langwatch/src/prompts/schemas/boundaries/form.schema.ts: Defines the form schema, adding UI-specific concerns.
// storage.schema.ts
import { configDataSchema } from "../domain";
import { snakeCaseKeys } from "./transformers";
export const storageConfigDataSchema = configDataSchema.transform(snakeCaseKeys);
// api-input.schema.ts
export const createPromptInputSchema = configDataSchema.partial().extend({
handle: handleSchema,
scope: scopeSchema.optional(),
});
// api-output.schema.ts
export const apiResponseSchema = z.object({
id: z.string(),
handle: z.string().nullable(),
// ...metadata
}).merge(configDataSchema);
Layer 3: Transform Utilities
This layer provides utilities for transforming data between different formats, such as snake case and camel case. This reduces the need for ad-hoc transformations throughout the codebase. The key file is:
langwatch/src/prompts/schemas/boundaries/transformers.ts: Contains functions for converting between snake case and camel case.
// transformers.ts
export function snakeToCamel<T extends Record<string, unknown>>(obj: T) {
// max_tokens → maxTokens
}
export function camelToSnake<T extends Record<string, unknown>>(obj: T) {
// maxTokens → max_tokens
}
Layer 4: DSL Integration
This layer focuses on integrating the domain schema with the DSL used in the optimization studio. The goal is to derive DSL types from the domain schema, eliminating the need for a separate parallel type system. The key file is:
optimization_studio/types/dsl.ts: Updated to derive DSL types from the domain schema.
// optimization_studio/types/dsl.ts
import { configDataSchema, type ConfigData } from "~/prompts/schemas/domain";
// Derive DSL types from domain schema instead of parallel definitions
export type LlmPromptConfigComponent = Signature & {
configId?: string;
handle?: string | null;
versionMetadata?: VersionMetadata;
// Derive from domain - no more manual Field[] definitions
inputs: ConfigData["inputs"];
outputs: ConfigData["outputs"];
configData: ConfigData; // Or flatten parameters into configData
};
Layer 5: SDK Types (Generated)
This layer generates SDK types from the OpenAPI specification, eliminating the need for manually maintained SDK schemas. Key files include:
typescript-sdk: Deletes manual schemas and uses only generated OpenAPI types.
// typescript-sdk - DELETE manual schemas
// Import from generated OpenAPI types only
import type { paths } from "@/internal/generated/openapi/api-client";
export type PromptResponse = paths["/api/prompts/{id}"]["get"]["responses"]["200"]["content"]["application/json"];
This layered architecture promotes a single source of truth, reduces duplication, and improves consistency across the entire system.
Migration Plan: A Step-by-Step Guide
The migration to the new architecture involves several phases. This section outlines the key steps involved in each phase.
Phase 1: Extract Domain Schema
This phase focuses on creating the domain schema and establishing it as the single source of truth.
- Create
prompts/schemas/domain/config-data.schema.ts. - Export
configDataSchemaas the camel case single source of truth. - Add transform utilities for snake case conversion.
Phase 2: Derive Boundary Schemas
This phase involves deriving the boundary schemas from the domain schema.
- Update
api-output.schema.tsto use the domain schema. - Update
api-input.schema.tsto derive from the domain schema. - Update
llm-config-version-schema.tsto use the domain schema and snake case transform. - Update
form-schema.tsto derive from the domain schema, keeping the UI structure.
Phase 3: Fix tRPC Router
This phase addresses the manual schema definitions in the tRPC router.
- Create a shared input schema in
prompts/schemas/boundaries/trpc-input.schema.ts. - Update
prompts.trpc-router.tsto use the shared schema.
Phase 4: Clean Up Service Types
This phase simplifies service types by inferring them from the API response schema.
- Delete the manual
VersionedPrompttype. - Infer it from
apiResponseSchema:type VersionedPrompt = z.infer<typeof apiResponseSchema>.
Phase 5: DSL Integration
This phase integrates the domain schema with the DSL in the optimization studio.
- Update
LlmPromptConfigComponentto deriveinputs/outputsfrom the domain schema. - Simplify
llmPromptConfigUtils.tsconverters. - Consider flattening DSL parameters into the
configDatastructure. - Update
SignaturePropertiesPanelFormto use shared types.
Phase 6: SDK Consolidation
This phase cleans up the SDK by using generated OpenAPI types.
- Delete
typescript-sdk/schema/prompt.schema.tsmanual schemas. - Use only generated OpenAPI types.
- Ensure Python SDK generation stays in sync.
Phase 7: Add Compatibility Tests
This phase adds tests to ensure schema compatibility.
- Test: storage schema ⊆ output schema.
- Test: domain schema → storage → domain schema roundtrip.
- Test: OpenAPI spec matches actual responses.
Key Architectural Decision: DSL Relationship
The relationship between the domain schema and the DSL is a critical architectural decision. Two options exist:
Option A: DSL derives from Domain (Recommended)
LlmPromptConfigComponent.inputsderives from `ConfigData[