Understanding Accuracy

Empromptu uses a 0-10 scoring system to measure how well your AI applications perform tasks. Understanding these accuracy metrics is essential for optimizing your applications for production readiness

What you'll learn ⏱️ 5 minutes

How to define accuracy
How Empromptu's 0-10 accuracy scoring works
What different score ranges mean for your application
The difference between Initial and Current accuracy
How scores are calculated and updated
What accuracy levels you need for production deployment
How to interpret score improvements

How to define accuracy

In order to define your task accuracy, you need to set an evaluation. Once you set an evaluation, you will measure how successfully your inputs and prompts achieve that goal.

The 0-10 Accuracy Scale

Empromptu measures accuracy using a 0-10 point scale where 10 represents perfect performance. Every task, evaluation, and optimization attempt receives a score within this range.

Score Ranges and Meanings:

🔴 0-3: Low Score (Needs Improvement)

Significant issues with output quality
Frequent errors or irrelevant responses
Not suitable for production use
Requires immediate optimization attention

🟠 4-6: Medium Score (Getting Better)

Acceptable performance but inconsistent
Some correct outputs mixed with errors
May work for internal testing but risky for production
Good foundation for optimization improvements

🔵 7-8: Good Score (Production Ready)

Reliable performance for most inputs
Occasional edge case issues but generally solid
Suitable for production deployment with monitoring
Meets business requirements for most use cases

🟢 9-10: Excellent Score (Optimal Performance)

Consistently high-quality outputs
Handles edge cases well
Exceeds business requirements
Ideal for critical business applications

Types of Accuracy Measurements

Project-Level Accuracy

Displayed on your project dashboard:

Average Initial Accuracy: Mean of all task initial scores in the project

Average Current Accuracy: Mean of all task current scores after optimization

Improvement Tracking: Shows overall project optimization progress

Task-Level Accuracy

Shown in the tasks table:

Initial Accuracy: First score when the task runs with optimization Current Accuracy: Latest score after optimization attempts Improvement: Change from initial to current (+/- value)

How Accuracy Scores Are Calculated

Evaluation-Based Scoring

Your accuracy score is calculated based on active evaluations:

Each evaluation gets scored individually (0-10)
Individual scores are averaged together
Overall score represents combined evaluation performance
Score reasoning explains which evaluations passed/failed

Example Calculation:

Task has 3 active evaluations:- "Correct Sequence": 8.0- "Accurate Details": 6.5  - "Complete Summary": 7.5Overall Score: (8.0 + 6.5 + 7.5) ÷ 3 = 7.3

Score Reasoning Example:

"extracted_completeness - AI response captures the essence and key emotional elements. Summary captures key experiential aspects while maintaining vivid language."Score: 7.000

This explains exactly why the score was assigned and what criteria were evaluated.

Initial vs Current Accuracy

Initial Accuracy

When it's set: The first time your task runs through optimization What it represents: Baseline performance before any improvements Typical range: Often 3.0-6.0 for new tasks Purpose: Establishes starting point for measuring improvement

Current Accuracy

When it updates: After each optimization attempt What it represents: Latest performance level achieved Expected progression: Should increase over time with optimization Target range: 7.0+ for production readiness

Improvement Tracking

Calculation: Current Accuracy - Initial Accuracy = Improvement Examples:

Initial: 4.5, Current: 7.8, Improvement: +3.3 ✅ Excellent progress
Initial: 6.0, Current: 5.8, Improvement: -0.2 ⚠️ Needs attention
Initial: 3.2, Current: 8.1, Improvement: +4.9 🎉 Outstanding optimization

What Different Scores Mean for Business

Production Readiness Guidelines:

Score 9.0+: Deploy with Confidence

Excellent for customer-facing applications
Suitable for critical business processes
Minimal monitoring required
Can handle high-volume usage

Score 7.0-8.9: Production Ready

Good for most business applications
Recommended for customer-facing use
Monitor performance and optimize over time
Suitable for moderate to high-volume usage

Score 5.0-6.9: Internal Use Only

Acceptable for internal tools and testing
Not recommended for customer-facing applications
Requires active optimization and monitoring
Good for pilot programs and validation

Score Below 5.0: Development Only

Not suitable for production deployment
Focus on optimization before considering deployment
Use for testing and development purposes only
Indicates need for significant improvement

Score Improvement Strategies

For Low Scores (0-3):

Primary focus: Fix fundamental issues

Review and improve evaluation criteria
Add more representative test inputs
Use automatic optimization to establish baseline
Check if task requirements are too complex

For Medium Scores (4-6):

Primary focus: Systematic optimization

Build out Prompt Families with specialized prompts
Use Edge Case Detection to find problem areas
Test different AI models for better performance
Add more specific evaluation criteria

For Good Scores (7-8):

Primary focus: Fine-tuning and edge cases

Use manual optimization for specific improvements
Monitor end-user inputs for new edge cases
Optimize for consistency across input types
Focus on business-critical evaluation criteria

For Excellent Scores (9-10):

Primary focus: Maintain and monitor

Monitor for performance degradation over time
Add new evaluations as requirements evolve
Use as baseline for similar tasks
Focus optimization efforts on other tasks

Common Score Patterns

Typical Optimization Journey:

Initial Build → Score: N/AFirst Optimization → Score: 4.5 (baseline established)Automatic Optimization → Score: 6.8 (significant improvement)Manual Refinement → Score: 7.9 (production ready)Edge Case Fixes → Score: 8.4 (optimized)

Warning Patterns:

Declining scores over time: May indicate changing requirements or new edge cases Plateauing scores: Optimization strategy may need adjustment High variance: Inconsistent performance suggests need for better Prompt Families

Using Scores for Decision Making

Development Decisions:

Score below 5.0: Continue optimization before deployment
Score 5.0-6.9: Consider internal pilot testing
Score 7.0+: Proceed with production deployment planning

Optimization Priorities:

Focus on lowest-scoring tasks first for maximum impact
Address tasks with declining scores to prevent issues
Optimize high-volume tasks to improve overall metrics

Business Communication:

Use score improvements to demonstrate AI initiative success
Set score targets for business stakeholders (e.g., "achieve 7.5+ before launch")
Track score trends to show continuous improvement

Troubleshooting Accuracy Issues

Scores Not Updating:

Check: Task is active and optimization is running Solution: Ensure evaluations are active and inputs are available

Inconsistent Score Ranges:

Check: Evaluation criteria clarity and representativeness Solution: Review and refine evaluation definitions

Scores Lower Than Expected:

Check: Task complexity vs evaluation criteria alignment Solution: Simplify task scope or adjust evaluation expectations

Cannot Achieve High Scores:

Check: Input quality and evaluation criteria realism Solution: Add better test inputs and review evaluation criteria

Best Practices for Accuracy Management

Set Realistic Targets:

New tasks: Target 6.0+ for initial success
Production tasks: Aim for 7.5+ for reliability
Critical tasks: Strive for 8.5+ for excellence

Monitor Continuously:

Check scores weekly for production tasks
Review trends monthly for optimization planning
Investigate drops immediately to prevent issues

Document Learning:

Track what works for achieving high scores
Note optimization strategies that deliver results
Share successful approaches across tasks and projects

Next Steps

Now that you understand accuracy scoring:

Start Optimizing: Learn how to improve your accuracy scores systematically
Set Up Evaluations: Create criteria that drive meaningful accuracy measurements
Use Task Actions: Access the tools you need to improve performance
Learn Prompt Optimization: Master the core technology for accuracy improvement

PreviousBuilder Interface NextAI Response Optimization Overview

Last updated 1 month ago