Add continuous evaluation + regression tests

#29
by jbakerx - opened

For every new adapter version:
run the same 50–100 prompt suite
track perplexity, repetition, toxicity/anachronism rate & human preference sample
This prevents silent quality regressions.

We will consider this enhancement for inclusion in version 2.0.0.

Sign up or log in to comment