* Enable vision models for Together and Fireworks * Works with ollama 0.4.0 pre-release with the vision model * localize media for meta_reference inference * Fix