Prompt Optimization

Prompt Optimization helps achieve higher accuracy than any single prompt could alone.

What you'll learn ⏱️ 7 minutes

  • How Prompt Families work and why they're more effective

  • The three optimization interfaces: Event Log, Prompt Family, and Manual Optimization

  • When to use automatic vs manual optimization

  • How to read and interpret optimization results

  • Best practices for building effective Prompt Families

Understanding Prompt Families

Traditional AI applications use a single prompt for all scenarios. Empromptu creates Prompt Families - multiple specialized prompts that handle different input types and edge cases.

How It Works

Instead of one prompt trying to handle everything:

Traditional: Single Prompt → All Inputs → Inconsistent Results

Empromptu builds a family of prompts that work together:

Empromptu: Input Analysis → Best Prompt Selected → Optimized Result

Example: Review Summarizer Prompt Family

Your Review Summarizer might develop prompts specialized for:

  • Prompt-1: Engaging summaries with emotional reactions and personal experiences

  • Prompt-2: Technical summaries focused on factual specifications and measurable data points

Each prompt in the family becomes an expert at handling specific types of reviews, leading to higher overall accuracy.

Accessing Prompt Optimization

From your project dashboard, click the Actions button on the task you want to optimize, then select "Prompt Optimization". This opens the optimization interface with three main tabs:

Interface Overview

  • Event Log: All API calls with detailed scoring

  • Prompt Family: Your collection of prompts and their management

  • Manual Optimization: Step-by-step optimization wizard

  • Automatic Optimization: Hands-off optimization process

Event Log: Understanding Your Optimization History

The Event Log shows every API call with comprehensive details:

Event Log Columns

  • Timestamp: When each optimization occurred

  • Messages: The input text that was processed

  • Model: Which AI model was used (e.g., gpt-4o-mini)

  • Temperature: Model creativity setting (typically 0.7)

  • Response: The generated output

  • Delivered Prompt: Which prompt from your family was used

  • Score Reasoning: Detailed explanation of why this score was assigned

  • Score: Performance rating (0-10 scale)

Reading Event Results

Each row represents one API call. Look for patterns:

High-performing events (7.0+):

  • Note which prompts work well

  • Identify successful input patterns

  • Understand what generates good scores

Low-performing events (below 5.0):

  • See which inputs cause problems

  • Identify prompt weaknesses

  • Find opportunities for new family members

Example Event Analysis

Timestamp: 21/04/2025, 04:58:14Input: "Create an engaging summary of the experiential aspects..."Model: gpt-4o-miniResponse: "Frogs burst into my life like playful whispers..."Score: 6.000Reasoning: "extracted_completeness - summary captures..."

This shows the system tested an experiential review and achieved moderate success with a creative prompt approach during an API call.

Prompt Family: Managing Your Prompt Collection

The Prompt Family tab shows your collection of prompts and how they work together:

Family Structure

Each prompt in your family has:

  • Name: Descriptive identifier (e.g., "Prompt-1", "Prompt-2")

  • Status: Active or Inactive

  • Model: Preferred AI model (e.g., gpt-4o-mini)

  • Temperature: Creativity setting

  • Performance: How well this prompt performs overall

  • Specialization: What types of inputs this prompt handles best

Creating New Family Members

Click "Create Prompt" to add specialized prompts:

  1. Define purpose: What specific scenario should this prompt handle?

  2. Write prompt text: Create instructions tailored to that scenario

  3. Set parameters: Choose model and temperature settings

  4. Test performance: Run against sample inputs

  5. Activate: Add to your active family

Example Prompt Family Growth

Starting point: Single general prompt

"Create an engaging summary of the experiential aspects from the provided text..."

After optimization: Specialized family

Prompt-1: "Create an engaging summary of experiential aspects - highlighting emotional reactions and personal experiences..."Prompt-2: "Analyze the provided text and create a concise technical summary that extracts factual specifications..."

Each family member becomes an expert at their specific type of content.

Manual Optimization: Step-by-Step Improvement

The Manual Optimization tab provides a guided workflow for targeted improvements

Manual Optimization is when you the developer can manually tinker with your prompts in our UI and use our models to generate more robust prompts. You can test your prompts to see how they work with various LLMs from Open AI to your own local model.

Step 1: Set Up Your Experiment

  • Select a Prompt: Choose which family member to optimize

  • Insert Inputs: Pick test data that represents real use cases

  • Choose Evaluations: Select success criteria to focus on

Step 2: Review Current Performance

The system shows:

  • Current scores for selected inputs

  • Which evaluations are struggling

  • Specific areas for improvement

Step 3: Run Targeted Optimization

  • Add Input: Include new test cases if needed

  • Add Evaluation: Create specific success criteria

  • Generate Variations: Create new prompt versions

  • Test Results: See immediate performance changes

Step 4: Analyze and Iterate

  • Compare scores across prompt variations

  • Identify the best-performing family members

  • Add successful variations to your active family

  • Remove or modify underperforming prompts

Manual Optimization Benefits

  • Targeted improvement: Focus on specific problem areas

  • Complete control: Guide the optimization process

  • Deep understanding: Learn what works and why

  • Custom solutions: Address unique use case requirements

Automatic Optimization: Hands-Off Improvement

The Automatic Optimization tab lets Empromptu's system improve your prompts without manual intervention.

Instead of statically defining your inputs, prompts, models in your code, you set an evaluation or in other words what good looks like and Empromptu will automatically create better more accurate prompts for you in real-time. This process is low latency.

How Automatic Optimization Works

  1. Set an eval: Set your eval

  2. Generation: Creates new prompt variations automatically

  3. Testing: Runs new prompts against your inputs and evaluations

  4. Selection: Adds best-performing prompts to your family

  5. Refinement: Continues improving over multiple iterations

When to Use Automatic

Best for:

  • Getting started quickly

  • Establishing baseline performance

  • Handling common optimization patterns

  • Continuous background improvement

Process:

  1. Click "Start Automatic Optimization"

  2. System analyzes current performance

  3. New prompts are generated and tested

  4. Results appear in Event Log in real-time

  5. Prompt Family grows automatically

Automatic Optimization Results

You'll see improvements like:

  • Initial accuracy: 4.5 → Current accuracy: 7.8

  • New family members: 1 prompt → 3 specialized prompts

  • Better coverage: Handles more input types effectively

Reading Optimization Scores

Score Interpretation

  • 9.0-10.0: Excellent performance, optimal results

  • 7.0-8.9: Good performance, production-ready

  • 5.0-6.9: Acceptable but improvable

  • 3.0-4.9: Needs attention, moderate issues

  • 0.0-2.9: Poor performance, requires optimization

Score Reasoning

Each score includes detailed reasoning that explains:

  • Which evaluations passed/failed

  • Specific issues identified

  • What contributed to the score

  • Areas for potential improvement

Example reasoning:

"extracted_completeness - AI response captures the essence and key emotional elements. Summary captures key experiential aspects while maintaining vivid language."

This tells you exactly why a score was assigned and what the system valued.

Best Practices for Prompt Optimization

Start with Automatic, Refine with Manual

  1. Run automatic optimization to establish a baseline Prompt Family

  2. Review Event Log to understand performance patterns

  3. Use manual optimization to address specific weaknesses

  4. Iterate based on real-world performance

Build Diverse Families

Include prompts for different scenarios:

  • Different input types (positive/negative reviews, technical/emotional content)

  • Different output requirements (brief/detailed, formal/casual)

  • Different edge cases (unusual inputs, specific formatting needs)

Use Quality Test Data

  • Representative inputs: Test data should match real use cases

  • Edge cases included: Test unusual or challenging scenarios

  • Sufficient volume: Use enough inputs to identify patterns

  • Regular updates: Add new test cases as you discover them

Monitor Performance Continuously

  • Check Event Log regularly for new patterns

  • Update families as use cases evolve

  • Remove underperforming prompts that aren't contributing

  • Add new specializations when needed

Advanced Optimization Strategies

Specialized Family Members

Create prompts for specific scenarios:

Input-type specialists:

  • Long-form content vs short snippets

  • Technical documentation vs marketing copy

  • Positive sentiment vs negative feedback

Output-format specialists:

  • Bullet points vs paragraphs

  • Brief summaries vs detailed analysis

  • Structured data vs natural language

Performance-Based Selection

Monitor which family members perform best:

  • High-frequency prompts: Used often, optimize carefully

  • High-accuracy prompts: Excellent results, study and replicate

  • Low-performing prompts: Consider removal or major revision

  • Underused prompts: May need better specialization

Continuous Improvement Workflow

  1. Weekly Event Log review: Identify new patterns and issues

  2. Monthly family audit: Remove underperforming prompts, add new specialists

  3. Quarterly strategy review: Assess overall optimization approach

  4. Ongoing testing: Add new inputs based on real user behavior

Troubleshooting Common Issues

Scores Not Improving

Symptoms: Accuracy plateauing despite optimization attempts Solutions:

  • Click Actions → Input Optimization to add more diverse test inputs

  • Click Actions → Evaluations to create more specific criteria

  • Try manual optimization for targeted improvement

  • Click Actions → Model Optimization to test different models

Inconsistent Results

Symptoms: Large score variations for similar inputs Solutions:

  • Review Prompt Family for conflicting approaches in the Prompt Family tab

  • Click Actions → Prompt Optimization → Manual Optimization to add more specialized family members

  • Click Actions → Evaluations to improve criteria clarity

  • Click Actions → Input Optimization to increase test data volume

Good Test Scores, Poor Real Performance

Symptoms: High scores in testing, user complaints in production Solutions:

  • Click Actions → Input Optimization → End User Inputs to review real user data for new patterns

  • Add real user data to manual test inputs

  • Click Actions → Evaluations to create criteria based on actual user needs

  • Monitor deployed application performance and add problematic inputs as test cases

Next Steps

Now that you understand Prompt Optimization:

  • Learn Task Actions: Understand how to access all optimization tools through the Actions button

  • Explore Edge Case Detection: Find and fix problematic inputs using visual analysis

  • Understand Evaluations: Set up success criteria that guide prompt optimization

  • Monitor End User Inputs: Use real user data to improve your optimization

Last updated