Concurrency for Ollama Server
The Ollama team released an awesome parallelisation update for v0.1.33 of the software. Providing two parameters OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS allows one to serve and max out the usage of the GPU’s allocation and VRAM. The settings can be either applied via...