What happens with my data?
We value your privacy and only store data that is relevant for our research. We act in accordance with the EU GDPR. For detailed information, see the data protection information sheet.
comparity.ai is an ongoing research project. The full dataset will be publicly released alongside the first publication from this project.
How much can I use comparity.ai?
Effectively, there are no limit to your usage. However, to protect from malicious intent each registered user account and IP address is limited to 500 requests per 24-hour window – which corresponds to approximately 1775000 tokens per user on average. This is assumed to be sufficient, even for power-users.
How are the scores computed?
There are two different leaderboards: One based on vote ELO and one based on Cascading engagement. In the former, each pairwise vote nudges the two models' ratings via the standard ELO update (K = 32 overall, K = 64 personal). Both-good and both-bad count as draws. Thus, higher rating means the model is preferred more often than its opponents.
Cascading engagement measures how long users dwell on each response before moving on, then strip out position bias (the first responses shown get more attention regardless of model) by fitting log(dwell) = αposition + βmodel with alternating least squares. The score is eβ: 1.00× = an average response at any given slot, 1.5× = users linger 50% longer than average after position is accounted for.
Why does it look different for different users?
comparity.ai has two distinct usage modes. One is the standard side by side view, where you see the output of two models, and one is the Cascading mode, where you only see one answer, but can "swipe" through all models.
Questions or interested in collaborating?
