Creating Automated Training Data

Create synthetic training examples automatically using the Alkemi Agent

The Automated tab leverages AI to generate synthetic training examples, accelerating the training process while maintaining quality through review mechanisms.

Ways to generate training data

1. Quick Generation (Synchronous)

Instantly creates up to 10 prompt/query pairs. Navigate to the Text to SQL Training tab in your Data Product and Click the "Generate" button.

What synchronous generation is best for

Trade-offs

Configures ongoing generation of synthetic queries.

  • Quantity: Specify how many synthetic queries to generate in total

    • This number includes existing rows. If you have 5 rows already and set the

  • Auto-approval: Choose whether queries should be:

    • Automatically approved and used immediately

    • Held in "pending" status for manual review

Configuration Options

When setting up automated generation, consider:

  • Volume: Start with smaller batches (between 5 and 20) to assess quality

  • Review requirements: Initially, either enable manual review or set a high minimum certainty (between 80% and 95%) to ensure the quality of generated examples

  • Iteration: Adjust configuration based on the quality of generated examples

Why asynchronous generation is preferred

Last updated