Automating Foundation Model Adaptation Through Gradient-Based Meta-Optimization Strategies
Keywords:
Foundation Models, Meta-Learning, Gradient-Based Optimization, Parameter-Efficient Adaptation, Bilevel Optimization, Hypernetworks, few-Shot Learning.Abstract
Foundation models can induce natural language processing and computer vision capabilities via generalized representations pre-trained on massive corpora. But tuning such large foundation models to downstream tasks is both computationally intractable and inefficient with standard fine-tuning procedures. In this work, introduce Gradient-Based Meta-Optimization Architecture (GB-MOA), a method that automates the adaptation process by building meta-learning into a second-order gradient optimization loop. GB-MOA employs a hypernetwork conditioned on the task-specific adapter weights, as well as a curriculum-driven bi-level optimization approach, which co-minimizes inner loop task losses and outer loop generalization loss. This study shows on GLUE, SuperGLUE, and few-shot classification benchmarks that model reaches 91.8% accuracy on a held-out composite GLUE benchmark with just 80 inner loop update steps, exceeding all baselines (including full fine-tuning, LoRA, and MAML variants), by updating less than 1.5% parameters. Ablation studies on various components of architecture support design choices.




