Loading...
plot the is_correct column to compute llm as a judge accuracy - ox/MedQuAD@eef1a8fd46b4ce14dcce533ec93ec5c