Day 4 Balancing Cost and Quality in FMware, Kirill Vasilevski



AI Summary

Balancing Cost and Quality in FM Deployments

Introduction

  • Speaker: Kol, Applied AI Researcher
  • Focus: Real-world considerations for deploying Foundation Models (FM).

Choosing the Correct Foundation Model (FM)

  • Considerations in selecting FMs for software applications.
  • Key factors:
    • Cost vs. Quality: OpenAI, AWS, and other FMs can be expensive.
    • Model capabilities and licensing.
    • Various models available (700,000+ on Hugging Face).

Deploying FM with Firmware

  • Techniques to reduce costs and improve decision-making.

Key Methods for FM Deployment

  1. Model Enhancement: Improving model performance through:
    • Parameterization (fine-tuning, reinforcement learning with human feedback).
    • Architectural methods (e.g., mixture of experts, prompt engineering).
    • Outputs effectiveness requires tuning specific to each model and context.
  2. Synthesis: Combining outputs from multiple models to improve results.
    • Example: Blender combines multiple outputs and ranks them.
    • Computationally intensive, potentially high latency.
  3. Routing: Selecting models based on input:
    • Predictive vs. Non-predictive routing.
    • Non-predictive: Sequentially test models until satisfactory results.
    • Predictive: Choose a model based on input characteristics, saving initial inference costs.

Challenges in Deployment

  • Reliance on the quality of training datasets for classifiers.
  • Complexity of updating models and the need for retraining.
  • Achieving good generalization across different tasks.

Conclusion

  • Mix and match methods for optimal model deployment.
  • No one-size-fits-all solution; considerations vary based on application and environment.
  • Treat FM as a black box to optimize outputs while maintaining quality.