Building Effective Training Data
- Start with common queries: Focus on the questions users ask most frequently
- Include edge cases: Add examples for complex or unusual queries
- Maintain diversity: Cover different aspects of your data schema
- Regular updates: Add new examples as user needs evolve
Quality Over Quantity
- 20 high-quality examples are more valuable than 200 poor ones
- Focus on accuracy and relevance to your specific use cases
- Regularly review and refine existing training data
Monitoring and Improvement
- Track Agent performance: Note when the Agent struggles with certain types of queries
- Add missing examples: Create training data for queries the Agent couldn't handle
- Review certainty ratings: Regularly check and fix low-certainty queries
- Iterate: Training is an ongoing process—continuously improve your dataset
Common Pitfalls to Avoid
- Ambiguous prompts: Ensure prompts clearly indicate the desired outcome
- Outdated examples: Remove training data that references deprecated tables or columns
- Duplicate concepts: Avoid too many similar examples that don't add value