Prompt Optimization
Prompt Optimization helps achieve higher accuracy than any single prompt could alone.
What you'll learn ⏱️ 7 minutes
How Prompt Families work and why they're more effective
The three optimization interfaces: Event Log, Prompt Family, and Manual Optimization
When to use automatic vs manual optimization
How to read and interpret optimization results
Best practices for building effective Prompt Families
Understanding Prompt Families
Traditional AI applications use a single prompt for all scenarios. Empromptu creates Prompt Families - multiple specialized prompts that handle different input types and edge cases.
How It Works
Instead of one prompt trying to handle everything:
Traditional: Single Prompt → All Inputs → Inconsistent ResultsEmpromptu builds a family of prompts that work together:
Empromptu: Input Analysis → Best Prompt Selected → Optimized ResultExample: Review Summarizer Prompt Family
Your Review Summarizer might develop prompts specialized for:
Prompt-1: Engaging summaries with emotional reactions and personal experiences
Prompt-2: Technical summaries focused on factual specifications and measurable data points
Each prompt in the family becomes an expert at handling specific types of reviews, leading to higher overall accuracy.
Accessing Prompt Optimization
From your project dashboard, click the Actions button on the task you want to optimize, then select "Prompt Optimization". This opens the optimization interface with three main tabs:
Interface Overview
Event Log: All API calls with detailed scoring
Prompt Family: Your collection of prompts and their management
Manual Optimization: Step-by-step optimization wizard
Automatic Optimization: Hands-off optimization process
Event Log: Understanding Your Optimization History
The Event Log shows every API call with comprehensive details:
Event Log Columns
Timestamp: When each optimization occurred
Messages: The input text that was processed
Model: Which AI model was used (e.g., gpt-4o-mini)
Temperature: Model creativity setting (typically 0.7)
Response: The generated output
Delivered Prompt: Which prompt from your family was used
Score Reasoning: Detailed explanation of why this score was assigned
Score: Performance rating (0-10 scale)
Reading Event Results
Each row represents one API call. Look for patterns:
High-performing events (7.0+):
Note which prompts work well
Identify successful input patterns
Understand what generates good scores
Low-performing events (below 5.0):
See which inputs cause problems
Identify prompt weaknesses
Find opportunities for new family members
Example Event Analysis
Timestamp: 21/04/2025, 04:58:14Input: "Create an engaging summary of the experiential aspects..."Model: gpt-4o-miniResponse: "Frogs burst into my life like playful whispers..."Score: 6.000Reasoning: "extracted_completeness - summary captures..."This shows the system tested an experiential review and achieved moderate success with a creative prompt approach during an API call.
Prompt Family: Managing Your Prompt Collection
The Prompt Family tab shows your collection of prompts and how they work together:
Family Structure
Each prompt in your family has:
Name: Descriptive identifier (e.g., "Prompt-1", "Prompt-2")
Status: Active or Inactive
Model: Preferred AI model (e.g., gpt-4o-mini)
Temperature: Creativity setting
Performance: How well this prompt performs overall
Specialization: What types of inputs this prompt handles best
Creating New Family Members
Click "Create Prompt" to add specialized prompts:
Define purpose: What specific scenario should this prompt handle?
Write prompt text: Create instructions tailored to that scenario
Set parameters: Choose model and temperature settings
Test performance: Run against sample inputs
Activate: Add to your active family
Example Prompt Family Growth
Starting point: Single general prompt
"Create an engaging summary of the experiential aspects from the provided text..."After optimization: Specialized family
Prompt-1: "Create an engaging summary of experiential aspects - highlighting emotional reactions and personal experiences..."Prompt-2: "Analyze the provided text and create a concise technical summary that extracts factual specifications..."Each family member becomes an expert at their specific type of content.
Manual Optimization: Step-by-Step Improvement
The Manual Optimization tab provides a guided workflow for targeted improvements
Manual Optimization is when you the developer can manually tinker with your prompts in our UI and use our models to generate more robust prompts. You can test your prompts to see how they work with various LLMs from Open AI to your own local model.
Step 1: Set Up Your Experiment
Select a Prompt: Choose which family member to optimize
Insert Inputs: Pick test data that represents real use cases
Choose Evaluations: Select success criteria to focus on
Step 2: Review Current Performance
The system shows:
Current scores for selected inputs
Which evaluations are struggling
Specific areas for improvement
Step 3: Run Targeted Optimization
Add Input: Include new test cases if needed
Add Evaluation: Create specific success criteria
Generate Variations: Create new prompt versions
Test Results: See immediate performance changes
Step 4: Analyze and Iterate
Compare scores across prompt variations
Identify the best-performing family members
Add successful variations to your active family
Remove or modify underperforming prompts
Manual Optimization Benefits
Targeted improvement: Focus on specific problem areas
Complete control: Guide the optimization process
Deep understanding: Learn what works and why
Custom solutions: Address unique use case requirements
Automatic Optimization: Hands-Off Improvement
The Automatic Optimization tab lets Empromptu's system improve your prompts without manual intervention.
Instead of statically defining your inputs, prompts, models in your code, you set an evaluation or in other words what good looks like and Empromptu will automatically create better more accurate prompts for you in real-time. This process is low latency.
How Automatic Optimization Works
Set an eval: Set your eval
Generation: Creates new prompt variations automatically
Testing: Runs new prompts against your inputs and evaluations
Selection: Adds best-performing prompts to your family
Refinement: Continues improving over multiple iterations
When to Use Automatic
Best for:
Getting started quickly
Establishing baseline performance
Handling common optimization patterns
Continuous background improvement
Process:
Click "Start Automatic Optimization"
System analyzes current performance
New prompts are generated and tested
Results appear in Event Log in real-time
Prompt Family grows automatically
Automatic Optimization Results
You'll see improvements like:
Initial accuracy: 4.5 → Current accuracy: 7.8
New family members: 1 prompt → 3 specialized prompts
Better coverage: Handles more input types effectively
Reading Optimization Scores
Score Interpretation
9.0-10.0: Excellent performance, optimal results
7.0-8.9: Good performance, production-ready
5.0-6.9: Acceptable but improvable
3.0-4.9: Needs attention, moderate issues
0.0-2.9: Poor performance, requires optimization
Score Reasoning
Each score includes detailed reasoning that explains:
Which evaluations passed/failed
Specific issues identified
What contributed to the score
Areas for potential improvement
Example reasoning:
"extracted_completeness - AI response captures the essence and key emotional elements. Summary captures key experiential aspects while maintaining vivid language."This tells you exactly why a score was assigned and what the system valued.
Best Practices for Prompt Optimization
Start with Automatic, Refine with Manual
Run automatic optimization to establish a baseline Prompt Family
Review Event Log to understand performance patterns
Use manual optimization to address specific weaknesses
Iterate based on real-world performance
Build Diverse Families
Include prompts for different scenarios:
Different input types (positive/negative reviews, technical/emotional content)
Different output requirements (brief/detailed, formal/casual)
Different edge cases (unusual inputs, specific formatting needs)
Use Quality Test Data
Representative inputs: Test data should match real use cases
Edge cases included: Test unusual or challenging scenarios
Sufficient volume: Use enough inputs to identify patterns
Regular updates: Add new test cases as you discover them
Monitor Performance Continuously
Check Event Log regularly for new patterns
Update families as use cases evolve
Remove underperforming prompts that aren't contributing
Add new specializations when needed
Advanced Optimization Strategies
Specialized Family Members
Create prompts for specific scenarios:
Input-type specialists:
Long-form content vs short snippets
Technical documentation vs marketing copy
Positive sentiment vs negative feedback
Output-format specialists:
Bullet points vs paragraphs
Brief summaries vs detailed analysis
Structured data vs natural language
Performance-Based Selection
Monitor which family members perform best:
High-frequency prompts: Used often, optimize carefully
High-accuracy prompts: Excellent results, study and replicate
Low-performing prompts: Consider removal or major revision
Underused prompts: May need better specialization
Continuous Improvement Workflow
Weekly Event Log review: Identify new patterns and issues
Monthly family audit: Remove underperforming prompts, add new specialists
Quarterly strategy review: Assess overall optimization approach
Ongoing testing: Add new inputs based on real user behavior
Troubleshooting Common Issues
Scores Not Improving
Symptoms: Accuracy plateauing despite optimization attempts Solutions:
Click Actions → Input Optimization to add more diverse test inputs
Click Actions → Evaluations to create more specific criteria
Try manual optimization for targeted improvement
Click Actions → Model Optimization to test different models
Inconsistent Results
Symptoms: Large score variations for similar inputs Solutions:
Review Prompt Family for conflicting approaches in the Prompt Family tab
Click Actions → Prompt Optimization → Manual Optimization to add more specialized family members
Click Actions → Evaluations to improve criteria clarity
Click Actions → Input Optimization to increase test data volume
Good Test Scores, Poor Real Performance
Symptoms: High scores in testing, user complaints in production Solutions:
Click Actions → Input Optimization → End User Inputs to review real user data for new patterns
Add real user data to manual test inputs
Click Actions → Evaluations to create criteria based on actual user needs
Monitor deployed application performance and add problematic inputs as test cases
Next Steps
Now that you understand Prompt Optimization:
Learn Task Actions: Understand how to access all optimization tools through the Actions button
Explore Edge Case Detection: Find and fix problematic inputs using visual analysis
Understand Evaluations: Set up success criteria that guide prompt optimization
Monitor End User Inputs: Use real user data to improve your optimization
Last updated