Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models ...
We introduce KorMedMCQA-V, a Korean medical licensing-exam-style multimodal multiple-choice question answering benchmark for evaluating vision-language models (VLMs). The dataset consists of 1,534 ...
A ready-to-use Python pipeline for running machine learning model inference using MNN (Mobile Neural Network). It handles the complete flow from loading an image, preprocessing it, executing the model ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results