Discussion about this post

User's avatar
Amit's avatar

"Taalas hard-wired models provide ~10x faster inference at ~10x less power. This seems tricky to adapt to a mixture of experts architecture, like that used in current frontier models, although maybe one could hard-wire every model in the ensemble somehow."

This shows a misunderstanding of what the mixture of experts architecture is, it is not an ensemble of models

2 more comments...

No posts

Ready for more?