Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models.
Users running a quantized 7B model on a laptop expect 40+ tokens per second. A 30B MoE model on a high-end mobile device ...
Some believe that AI firms of generic AI ought to be forced into leaning into customized LLMs that do mental health support. Good idea or bad? An AI Insider analysis.