INDEX 03NOTES.

Essays Notes Talks Portfolio whoami

Subscribe_

Home Essays Notes Talks Portfolio whoami

Nnamdi's Notes

Search

Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging

Ryo Bertolissi, Jonas Hübotter, Ido Hakimi, Andreas Krause

Key contributions

Test-time model merging (training and merging many expert models for a single task / domain) beats traditional model merging (training and merging a few models for multiple tasks)
Test-time model merging approaches performance of test-time training with almost no test-time compute overhead

Inductive vs transductive learning

During traditional training, learn a model by inductively extracting general rules from data that can then be applied to downstream examples at test time.
Transductive learning directly uses test examples (with no labels) to make predictions. You predict specific examples rather than attempting to learn a more general function, which is in some sense a simpler problem. No generalization beyond the specific test examples is required or expected.

Test-time training (TTT)

Fine-tuning a model for every individual task (prompt)
Significantly improves model performance at high test-time computational cost

Test-time model merging (TTMM)

At train-time, cluster the training data into local neighborhoods and train a small expert LoRA adapter for each cluster
At test-time, dynamically select a subset of LoRA adapters and merge their parameters to form a s single task specific model
Approaches the performance of TTT without significant compute or memory cost

Model merging

In a multitask setting, merge multiple expert models, typically small number of models / tasks. Happens once at the end of training, then model is fixed thereafter
TTMM differs in that models are local to a specific task in question, merging many related local models for a single task at test-time

References

paper read online

Backlinks

No backlinks found

whoisnnamdi.com
X
LinkedIn
GitHub