Prompt Optimization

Prompt Optimization helps achieve higher accuracy than any single prompt could alone.

What you'll learn ⏱️ 7 minutes

How Prompt Families work and why they're more effective
The three optimization interfaces: Event Log, Prompt Family, and Manual Optimization
When to use automatic vs manual optimization
How to read and interpret optimization results
Best practices for building effective Prompt Families

Understanding Prompt Families

Traditional AI applications use a single prompt for all scenarios. Empromptu creates Prompt Families - multiple specialized prompts that handle different input types and edge cases.

How It Works

Instead of one prompt trying to handle everything:

Traditional: Single Prompt → All Inputs → Inconsistent Results

Empromptu builds a family of prompts that work together:

Empromptu: Input Analysis → Best Prompt Selected → Optimized Result

Example: Review Summarizer Prompt Family

Your Review Summarizer might develop prompts specialized for:

Prompt-1: Engaging summaries with emotional reactions and personal experiences
Prompt-2: Technical summaries focused on factual specifications and measurable data points

Each prompt in the family becomes an expert at handling specific types of reviews, leading to higher overall accuracy.

Accessing Prompt Optimization

From your project dashboard, click the Actions button on the task you want to optimize, then select "Prompt Optimization". This opens the optimization interface with three main tabs:

Interface Overview

Event Log: All API calls with detailed scoring
Prompt Family: Your collection of prompts and their management
Manual Optimization: Step-by-step optimization wizard
Automatic Optimization: Hands-off optimization process

Event Log: Understanding Your Optimization History

The Event Log shows every API call with comprehensive details:

Event Log Columns

Timestamp: When each optimization occurred
Messages: The input text that was processed
Model: Which AI model was used (e.g., gpt-4o-mini)
Temperature: Model creativity setting (typically 0.7)
Response: The generated output
Delivered Prompt: Which prompt from your family was used
Score Reasoning: Detailed explanation of why this score was assigned
Score: Performance rating (0-10 scale)

Reading Event Results

Each row represents one API call. Look for patterns:

High-performing events (7.0+):

Note which prompts work well
Identify successful input patterns
Understand what generates good scores

Low-performing events (below 5.0):

See which inputs cause problems
Identify prompt weaknesses
Find opportunities for new family members

Example Event Analysis

Timestamp: 21/04/2025, 04:58:14Input: "Create an engaging summary of the experiential aspects..."Model: gpt-4o-miniResponse: "Frogs burst into my life like playful whispers..."Score: 6.000Reasoning: "extracted_completeness - summary captures..."

This shows the system tested an experiential review and achieved moderate success with a creative prompt approach during an API call.

Prompt Family: Managing Your Prompt Collection

The Prompt Family tab shows your collection of prompts and how they work together:

Family Structure

Each prompt in your family has:

Name: Descriptive identifier (e.g., "Prompt-1", "Prompt-2")
Status: Active or Inactive
Model: Preferred AI model (e.g., gpt-4o-mini)
Temperature: Creativity setting
Performance: How well this prompt performs overall
Specialization: What types of inputs this prompt handles best

Creating New Family Members

Click "Create Prompt" to add specialized prompts:

Define purpose: What specific scenario should this prompt handle?
Write prompt text: Create instructions tailored to that scenario
Set parameters: Choose model and temperature settings
Test performance: Run against sample inputs
Activate: Add to your active family

Example Prompt Family Growth

Starting point: Single general prompt

"Create an engaging summary of the experiential aspects from the provided text..."

After optimization: Specialized family

Prompt-1: "Create an engaging summary of experiential aspects - highlighting emotional reactions and personal experiences..."Prompt-2: "Analyze the provided text and create a concise technical summary that extracts factual specifications..."

Each family member becomes an expert at their specific type of content.

Manual Optimization: Step-by-Step Improvement

The Manual Optimization tab provides a guided workflow for targeted improvements

Manual Optimization is when you the developer can manually tinker with your prompts in our UI and use our models to generate more robust prompts. You can test your prompts to see how they work with various LLMs from Open AI to your own local model.

Step 1: Set Up Your Experiment

Select a Prompt: Choose which family member to optimize
Insert Inputs: Pick test data that represents real use cases
Choose Evaluations: Select success criteria to focus on

Step 2: Review Current Performance

The system shows:

Current scores for selected inputs
Which evaluations are struggling
Specific areas for improvement

Step 3: Run Targeted Optimization

Add Input: Include new test cases if needed
Add Evaluation: Create specific success criteria
Generate Variations: Create new prompt versions
Test Results: See immediate performance changes

Step 4: Analyze and Iterate

Compare scores across prompt variations
Identify the best-performing family members
Add successful variations to your active family
Remove or modify underperforming prompts

Manual Optimization Benefits

Targeted improvement: Focus on specific problem areas
Complete control: Guide the optimization process
Deep understanding: Learn what works and why
Custom solutions: Address unique use case requirements

Automatic Optimization: Hands-Off Improvement

The Automatic Optimization tab lets Empromptu's system improve your prompts without manual intervention.

Instead of statically defining your inputs, prompts, models in your code, you set an evaluation or in other words what good looks like and Empromptu will automatically create better more accurate prompts for you in real-time. This process is low latency.

How Automatic Optimization Works

Set an eval: Set your eval
Generation: Creates new prompt variations automatically
Testing: Runs new prompts against your inputs and evaluations
Selection: Adds best-performing prompts to your family
Refinement: Continues improving over multiple iterations

When to Use Automatic

Best for:

Getting started quickly
Establishing baseline performance
Handling common optimization patterns
Continuous background improvement

Process:

Click "Start Automatic Optimization"
System analyzes current performance
New prompts are generated and tested
Results appear in Event Log in real-time
Prompt Family grows automatically

Automatic Optimization Results

You'll see improvements like:

Initial accuracy: 4.5 → Current accuracy: 7.8
New family members: 1 prompt → 3 specialized prompts
Better coverage: Handles more input types effectively

Reading Optimization Scores

Score Interpretation

9.0-10.0: Excellent performance, optimal results
7.0-8.9: Good performance, production-ready
5.0-6.9: Acceptable but improvable
3.0-4.9: Needs attention, moderate issues
0.0-2.9: Poor performance, requires optimization

Score Reasoning

Each score includes detailed reasoning that explains:

Which evaluations passed/failed
Specific issues identified
What contributed to the score
Areas for potential improvement

Example reasoning:

"extracted_completeness - AI response captures the essence and key emotional elements. Summary captures key experiential aspects while maintaining vivid language."

This tells you exactly why a score was assigned and what the system valued.

Best Practices for Prompt Optimization

Start with Automatic, Refine with Manual

Run automatic optimization to establish a baseline Prompt Family
Review Event Log to understand performance patterns
Use manual optimization to address specific weaknesses
Iterate based on real-world performance

Build Diverse Families

Include prompts for different scenarios:

Different input types (positive/negative reviews, technical/emotional content)
Different output requirements (brief/detailed, formal/casual)
Different edge cases (unusual inputs, specific formatting needs)

Use Quality Test Data

Representative inputs: Test data should match real use cases
Edge cases included: Test unusual or challenging scenarios
Sufficient volume: Use enough inputs to identify patterns
Regular updates: Add new test cases as you discover them

Monitor Performance Continuously

Check Event Log regularly for new patterns
Update families as use cases evolve
Remove underperforming prompts that aren't contributing
Add new specializations when needed

Advanced Optimization Strategies

Specialized Family Members

Create prompts for specific scenarios:

Input-type specialists:

Long-form content vs short snippets
Technical documentation vs marketing copy
Positive sentiment vs negative feedback

Output-format specialists:

Bullet points vs paragraphs
Brief summaries vs detailed analysis
Structured data vs natural language

Performance-Based Selection

Monitor which family members perform best:

High-frequency prompts: Used often, optimize carefully
High-accuracy prompts: Excellent results, study and replicate
Low-performing prompts: Consider removal or major revision
Underused prompts: May need better specialization

Continuous Improvement Workflow

Weekly Event Log review: Identify new patterns and issues
Monthly family audit: Remove underperforming prompts, add new specialists
Quarterly strategy review: Assess overall optimization approach
Ongoing testing: Add new inputs based on real user behavior

Troubleshooting Common Issues

Scores Not Improving

Symptoms: Accuracy plateauing despite optimization attempts Solutions:

Click Actions → Input Optimization to add more diverse test inputs
Click Actions → Evaluations to create more specific criteria
Try manual optimization for targeted improvement
Click Actions → Model Optimization to test different models

Inconsistent Results

Symptoms: Large score variations for similar inputs Solutions:

Review Prompt Family for conflicting approaches in the Prompt Family tab
Click Actions → Prompt Optimization → Manual Optimization to add more specialized family members
Click Actions → Evaluations to improve criteria clarity
Click Actions → Input Optimization to increase test data volume

Good Test Scores, Poor Real Performance

Symptoms: High scores in testing, user complaints in production Solutions:

Click Actions → Input Optimization → End User Inputs to review real user data for new patterns
Add real user data to manual test inputs
Click Actions → Evaluations to create criteria based on actual user needs
Monitor deployed application performance and add problematic inputs as test cases

Next Steps

Now that you understand Prompt Optimization:

Learn Task Actions: Understand how to access all optimization tools through the Actions button
Explore Edge Case Detection: Find and fix problematic inputs using visual analysis
Understand Evaluations: Set up success criteria that guide prompt optimization
Monitor End User Inputs: Use real user data to improve your optimization

PreviousAI Response Optimization Overview NextInput Optimization

Last updated 1 month ago