LLMOS v0.2 - Simplify AI Management, Unlock GPU Potential
๐ Introducing LLMOS v0.2โ
LLMOS is a cloud-native tool designed to accelerate AI application development and simplify the management of large language models (LLMs). It supports deployment on both public clouds and private GPU servers, enabling you to easily deploy private AI models, scale machine learning workflows, and reduce the complexity of development and operations.
With the increasing demand for GPU virtualization (vGPU) and resource utilization, the v0.2 release prioritizes features like vGPU management, cluster and GPU resource monitoring, and alerting to maximize GPU management efficiency and utilization.
๐ Key Featuresโ
1. More Efficient GPU Managementโ
Introducing support for NVIDIA Virtual GPU (vGPU), allowing you to choose between virtual or full GPUs based on your needs. This accelerates resource allocation and optimizes the utilization of GPU VRAM and CUDA cores.
GPU Model | Support | Architecture |
---|---|---|
A100, A200 | โ | NVIDIA Ampere |
H100, H200 | โ | NVIDIA Hopper |
Tesla T4/T4G | โ | NVIDIA Turing |
30x/40x Series | โ | Ada Lovelace/Ampere |
โ Virtual GPU (vGPU): Optimize GPU utilization and scale workloads seamlessly.
โ GPU Management Interface: Intuitively view GPU details and monitor resources in real time.
๐ Learn More
2. Real-Time Monitoring and Alertsโ
Enable GPU and cluster monitoring with a single click using preconfigured Grafana dashboards and Prometheus alerts. Track performance metrics in real time to ensure stable workload operations.
โ Real-Time Monitoring: Stay informed about cluster and GPU status.
โ Intelligent Alerts: Predefined rules to reduce risks of failures and downtime.
โ Pause and Resume Workloads: Release idle resources to enhance efficiency.
๐ Learn More
โก Key Enhancementsโ
1. Faster Installation Experienceโ
-
For CN users, you can accelerate installation with
--mirror cn
.curl -sfL https://get-llmos.1block.ai | sh -s - --cluster-init --token mytoken --mirror cn
-
For restricted network and air-gap environments, use configurations like
globalSystemImageRegistry
orregistries
to integrate private image registries, accelerating and simplifying the installation process.
2. Expanded Model Service Sourcesโ
Support loading AI models from HuggingFace, ModelScope, or local paths, offering more flexibility in model deployment for your projects.
3. Optimized Workload Managementโ
- Automatic Volume Cleanup: Automatically release storage resources after workload deletion, simplifying management.
- Notebook Optimization: Added support for Jupyter Pipeline images and i18n localization(e.g., Chinese) patches.
- Node-Level GPU Metrics Optimization: Gain detailed overviews of GPU resources to enable fine-grained management.
- Enhanced Model Token Metrics: View real-time model token usage and response speeds to optimize resource planning and task execution of model services.
๐ Updates and Fixesโ
- Dependency Updates: System dependencies have been updated to improve performance, security, and compatibility:
- Rook Ceph and Ceph cluster upgraded to
v1.15.7
. - Snapshot Controller upgraded to
v8.2.0
. - Upgrade Controller upgraded to
v0.14.2
.
- Rook Ceph and Ceph cluster upgraded to
- Key Bug Fixes:
- Model Service Parameter Issues: Fixed to allow seamless parameter updates.
- Label Nil Exception: Custom addon will no longer experience label
nil
exception. - User Permission Optimization: Removed unnecessary node permissions for regular users, enhancing security.
๐ Ready to Experience?โ
Visit the documentation to learn more. Upgrade to LLMOS v0.2 today and experience the future of AI management!
๐ Upgrade Now!