Our AI Agent acts as your personal data scientist, capable of understanding natural language queries and translating them into SQL to analyze your connected data sources. The quality of the Agent's responses depends significantly on having high-quality examples of user prompts paired with their corresponding SQL queries. This guide explains how to create, manage, and optimize your training data through the Text to SQL Training interface.


Creating Training Data

The Text to SQL Training feature is divided into two main approaches:

<aside>

Manual Training Data Creation

Creating training examples by hand with direct control.

</aside>

<aside>

Automated Training Data Creation

Create synthetic training examples automatically using the Alkemi Agent

</aside>

Both methods contribute to the same training dataset that improves the Alkemi Agent's ability to understand and respond to queries about your specific data.


Reviewing and Managing Training Data

Quality control is essential for maintaining an effective Agent. The system provides multiple review workflows depending on the data source and confidence levels.

<aside>

Reviewing Pending Queries

Review synthetic training examples

</aside>

<aside>

Reviewing Low Certainty Queries

Review questionable training examples

</aside>


Our Recommended Approach

  1. Start with manual creation of your most important queries
  2. Enable automated generation with manual review
  3. Regularly review low-certainty queries
  4. Monitor Agent performance and iterate on training data
  5. Gradually increase automation as quality improves

Remember: The Agent learns from your examples. Investing time in quality training data pays dividends in Agent performance and user satisfaction.

<aside>

Best Practices and Tips

Make it easy to succeed

</aside>

<aside>

Troubleshooting

Common issues and solutions

</aside>