Skip to main content

LLMOS v0.2 - Simplify AI Management, Unlock GPU Potential

ยท 3 min read
Guangbo Chen
Founder of 1BLOCK.AI

๐Ÿš€ Introducing LLMOS v0.2โ€‹

LLMOS is a cloud-native tool designed to accelerate AI application development and simplify the management of large language models (LLMs). It supports deployment on both public clouds and private GPU servers, enabling you to easily deploy private AI models, scale machine learning workflows, and reduce the complexity of development and operations.

With the increasing demand for GPU virtualization (vGPU) and resource utilization, the v0.2 release prioritizes features like vGPU management, cluster and GPU resource monitoring, and alerting to maximize GPU management efficiency and utilization.

๐ŸŒŸ Key Featuresโ€‹

1. More Efficient GPU Managementโ€‹

Introducing support for NVIDIA Virtual GPU (vGPU), allowing you to choose between virtual or full GPUs based on your needs. This accelerates resource allocation and optimizes the utilization of GPU VRAM and CUDA cores.

GPU ModelSupportArchitecture
A100, A200โœ…NVIDIA Ampere
H100, H200โœ…NVIDIA Hopper
Tesla T4/T4Gโœ…NVIDIA Turing
30x/40x Seriesโœ…Ada Lovelace/Ampere

โœ” Virtual GPU (vGPU): Optimize GPU utilization and scale workloads seamlessly.

model-service-create model-service-vram

โœ” GPU Management Interface: Intuitively view GPU details and monitor resources in real time.

gpu-device gpu-device-metrics

๐Ÿ‘‰ Learn More

2. Real-Time Monitoring and Alertsโ€‹

Enable GPU and cluster monitoring with a single click using preconfigured Grafana dashboards and Prometheus alerts. Track performance metrics in real time to ensure stable workload operations.

โœ” Real-Time Monitoring: Stay informed about cluster and GPU status.

cluster-gpu-metrics

โœ” Intelligent Alerts: Predefined rules to reduce risks of failures and downtime.

monitoring-rules

โœ” Pause and Resume Workloads: Release idle resources to enhance efficiency.

workload-actions

๐Ÿ‘‰ Learn More

โšก Key Enhancementsโ€‹

1. Faster Installation Experienceโ€‹

  • For CN users, you can accelerate installation with --mirror cn.

    curl -sfL https://get-llmos.1block.ai | sh -s - --cluster-init --token mytoken --mirror cn
  • For restricted network and air-gap environments, use configurations like globalSystemImageRegistry or registries to integrate private image registries, accelerating and simplifying the installation process.

2. Expanded Model Service Sourcesโ€‹

Support loading AI models from HuggingFace, ModelScope, or local paths, offering more flexibility in model deployment for your projects.

model-service-sources

3. Optimized Workload Managementโ€‹

  • Automatic Volume Cleanup: Automatically release storage resources after workload deletion, simplifying management.
  • Notebook Optimization: Added support for Jupyter Pipeline images and i18n localization(e.g., Chinese) patches. notebook-pipeline
  • Node-Level GPU Metrics Optimization: Gain detailed overviews of GPU resources to enable fine-grained management. node-metrics
  • Enhanced Model Token Metrics: View real-time model token usage and response speeds to optimize resource planning and task execution of model services. token-metrics

๐Ÿ›  Updates and Fixesโ€‹

  • Dependency Updates: System dependencies have been updated to improve performance, security, and compatibility:
    • Rook Ceph and Ceph cluster upgraded to v1.15.7.
    • Snapshot Controller upgraded to v8.2.0.
    • Upgrade Controller upgraded to v0.14.2.
  • Key Bug Fixes:
    • Model Service Parameter Issues: Fixed to allow seamless parameter updates.
    • Label Nil Exception: Custom addon will no longer experience label nil exception.
    • User Permission Optimization: Removed unnecessary node permissions for regular users, enhancing security.

๐ŸŒ Ready to Experience?โ€‹

Visit the documentation to learn more. Upgrade to LLMOS v0.2 today and experience the future of AI management!

๐Ÿš€ Upgrade Now!