Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's a significant rub with LLMs, particularly hosted ones: the variability. Add in quantization, speculative decoding, and dynamic adjustment of temperature, nucleus sampling, attention head count, & skipped layers at runtime, and you can get wildly different behaviors with even the same prompt and context sent to the same model endpoint a couple hours apart.

That's all before you even get to all of the other quirks with LLMs.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: