Testing SDK Reference
Run a test suite
run_test_suite / runTestSuite is the main entrypoint into the testing framework.
Below are the arguments you can pass to this function:
- Name
 id- Type
 - string
 - Required
 - Description
 A unique ID for the test suite. This will be displayed in the Autoblocks platform and should remain the same for the lifetime of a test suite.
- Name
 test_cases- Type
 - list[BaseTestCase]
 - Required
 - Description
 A list of instances that subclass
BaseTestCase. These are typically dataclasses and can be any schema that facilitates testing your application. They will be passed directly tofnand will also be made available to your evaluators.BaseTestCaseis an abstract base class that requires you to implement thehashfunction. See Test case hashing for more information.
- Name
 evaluators- Type
 - list[BaseTestEvaluator]
 - Required
 - Description
 A list of instances that subclass
BaseTestEvaluator.
- Name
 fn- Type
 - Callable[[BaseTestCase], Any]
 - Required
 - Description
 The function you are testing. Its only argument is an instance of a test case. This function can be synchronous or asynchronous and can return any type.
- Name
 max_test_case_concurrency- Type
 - int
 - Description
 The maximum number of test cases that can be running concurrently through
fn. Useful to avoid rate limiting from external services, such as an LLM provider.
- Name
 grid_search_params- Type
 - dict[str, Sequence[Any]]
 - Description
 Grid search enables you to test multiple combinations of parameters in your application. See grid search for more information.
run_test_suite(
  id="my-test-suite",
  test_cases=gen_test_cases(),
  evaluators=[HasAllSubstrings(), IsFriendly()],
  fn=test_fn,
)
Test case hashing
Test cases need to be uniquely identified by a hash while still allowing for the test case to evolve over time.
In general, your hash should likely be comprised of the properties you consider "inputs" to your test function.
In the example below, the test cases are identified by a combination of their x and y properties.
This allows you to change or add properties related to expectations without losing the identity and thus the history of the test case:
All test cases must subclass BaseTestCase and implement the hash method.
The hash method should return a string that uniquely identifies the test case for its lifetime.
import dataclasses
from autoblocks.testing.models import BaseTestCase
from autoblocks.testing.util import md5
@dataclasses.dataclass
class MyTestCase(BaseTestCase):
    # Input properties
    x: int
    y: int
    # Expectation properties
    expected_sum: int
    expected_product: int
    # I can add more properties here as my test case evolves
    # without losing the identity + history of the test case
    # expected_difference: int
    def hash(self) -> str:
        """ My hash is only comprised of my input properties. """
        return md5(f"{self.x}-{self.y}")
Hashes only need to be unique within a single test suite.
Hashes should be no more than 100 characters.
BaseTestEvaluator
An abstract base class that you can subclass to create your own evaluators.
- Name
 id- Type
 - string
 - Required
 - Description
 A unique identifier for the evaluator.
- Name
 max_concurrency- Type
 - number
 - Description
 The maximum number of concurrent calls to
evaluate_test_caseallowed for the evaluator. Useful to avoid rate limiting from external services, such as an LLM provider.
- Name
 evaluate_test_case- Type
 - Callable[[BaseTestCase, Any], Optional[Evaluation]]
 - Required
 - Description
 Creates an evaluation on a test case and its output. This method can be synchronous or asynchronous.
from autoblocks.testing.models import BaseTestEvaluator
from autoblocks.testing.models import Evaluation
class MyEvaluator(BaseTestEvaluator):
  id = "my-evaluator"
  max_concurrency = 5
  def evaluate_test_case(self, test_case: SomeTestCase, output: string) -> Evaluation:
    return Evaluation(score=0.5)
Not all evaluators need to reference the test case. Some are "stateless" and evaluate the output in isolation.
Evaluation
- Name
 score- Type
 - number
 - Required
 - Description
 A number between 0 and 1 that represents the score of the evaluation.
- Name
 threshold- Type
 - Threshold
 - Description
 An optional
Thresholdthat describes the range the score must be in in order to be considered passing. If no threshold is attached, the score is reported and the pass / fail status is undefined.
- Name
 metadata- Type
 - object
 - Description
 Key-value pairs that provide additional context about the evaluation. This is typically used to explain why an evaluation failed.
Attached metadata is surfaced in the test run comparison UI.
from autoblocks.testing.models import Evaluation
from autoblocks.testing.models import Threshold
# Evaluation with score and threshold
Evaluation(
  score=0.5,
  threshold=Threshold(lt=0.6),
)
# Evaluation with score, threshold, and metadata
Evaluation(
  score=0,
  threshold=Threshold(gte=1),
  metadata={
    "reason": "An explanation of why the evaluation failed",
  },
)
# Evaluation with score only
Evaluation(score=0.5)
Threshold
You can use any combination of these properties to define a range for the score.
- Name
 lt- Type
 - number
 - Description
 The score must be less than this number in order to be considered passing.
- Name
 lte- Type
 - number
 - Description
 The score must be less than or equal to this number in order to be considered passing.
- Name
 gt- Type
 - number
 - Description
 The score must be greater than this number in order to be considered passing.
- Name
 gte- Type
 - number
 - Description
 The score must be greater than or equal to this number in order to be considered passing.
from autoblocks.testing.models import Evaluation
from autoblocks.testing.models import Threshold
Evaluation(
  score=0.5,
  # Score must greater than or equal to 1
  threshold=Threshold(gte=1),
)
Evaluation(
  score=0.5,
  # Score must be:
  # - greater than or equal to 0.4 AND
  # - less than 0.6
  threshold=Threshold(
    gte=0.4,
    lt=0.6,
  ),
)

