Benchmark¶
We provide a benchmarking framework to enable users to compare multiple synthetic data generators against each other. The evaluation metrics are documented within task_evaluator, please visit Evaluation to read more about it.
Process¶
We evaluate the performance of pipelines by following a series of executions. From a high level, we can view the process as:
Generate a list of tasks of interest.
Compute the scores on our test data using multiple metrics (e.g. accuracy and f1).
Finally, we output a results csv with these metrics