跳到主要内容

LLMOS v0.2 - Simplify AI Management, Unlock GPU Potential

· 阅读需 3 分钟
Guangbo Chen
Founder of 1BLOCK.AI

🚀 Introducing LLMOS v0.2

LLMOS is a cloud-native tool designed to accelerate AI application development and simplify the management of large language models (LLMs). It supports deployment on both public clouds and private GPU servers, enabling you to easily deploy private AI models, scale machine learning workflows, and reduce the complexity of development and operations.

With the increasing demand for GPU virtualization (vGPU) and resource utilization, the v0.2 release prioritizes features like vGPU management, cluster and GPU resource monitoring, and alerting to maximize GPU management efficiency and utilization.

🌟 Key Features

1. More Efficient GPU Management

Introducing support for NVIDIA Virtual GPU (vGPU), allowing you to choose between virtual or full GPUs based on your needs. This accelerates resource allocation and optimizes the utilization of GPU VRAM and CUDA cores.

GPU ModelSupportArchitecture
A100, A200NVIDIA Ampere
H100, H200NVIDIA Hopper
Tesla T4/T4GNVIDIA Turing
30x/40x SeriesAda Lovelace/Ampere

Virtual GPU (vGPU): Optimize GPU utilization and scale workloads seamlessly.

model-service-create model-service-vram

GPU Management Interface: Intuitively view GPU details and monitor resources in real time.

gpu-device gpu-device-metrics

👉 Learn More

2. Real-Time Monitoring and Alerts

Enable GPU and cluster monitoring with a single click using preconfigured Grafana dashboards and Prometheus alerts. Track performance metrics in real time to ensure stable workload operations.

Real-Time Monitoring: Stay informed about cluster and GPU status.

cluster-gpu-metrics

Intelligent Alerts: Predefined rules to reduce risks of failures and downtime.

monitoring-rules

Pause and Resume Workloads: Release idle resources to enhance efficiency.

workload-actions

👉 Learn More

⚡ Key Enhancements

1. Faster Installation Experience

  • For CN users, you can accelerate installation with --mirror cn.

    curl -sfL https://get-llmos.1block.ai | sh -s - --cluster-init --token mytoken --mirror cn
  • For restricted network and air-gap environments, use configurations like globalSystemImageRegistry or registries to integrate private image registries, accelerating and simplifying the installation process.

2. Expanded Model Service Sources

Support loading AI models from HuggingFace, ModelScope, or local paths, offering more flexibility in model deployment for your projects.

model-service-sources

3. Optimized Workload Management

  • Automatic Volume Cleanup: Automatically release storage resources after workload deletion, simplifying management.
  • Notebook Optimization: Added support for Jupyter Pipeline images and i18n localization(e.g., Chinese) patches. notebook-pipeline
  • Node-Level GPU Metrics Optimization: Gain detailed overviews of GPU resources to enable fine-grained management. node-metrics
  • Enhanced Model Token Metrics: View real-time model token usage and response speeds to optimize resource planning and task execution of model services. token-metrics

🛠 Updates and Fixes

  • Dependency Updates: System dependencies have been updated to improve performance, security, and compatibility:
    • Rook Ceph and Ceph cluster upgraded to v1.15.7.
    • Snapshot Controller upgraded to v8.2.0.
    • Upgrade Controller upgraded to v0.14.2.
  • Key Bug Fixes:
    • Model Service Parameter Issues: Fixed to allow seamless parameter updates.
    • Label Nil Exception: Custom addon will no longer experience label nil exception.
    • User Permission Optimization: Removed unnecessary node permissions for regular users, enhancing security.

🌐 Ready to Experience?

Visit the documentation to learn more. Upgrade to LLMOS v0.2 today and experience the future of AI management!

🚀 Upgrade Now!