跳到主要内容

2 篇博文 含有标签「AI Infrastructure」

查看所有标签

LLMOS v0.2 - Simplify AI Management, Unlock GPU Potential

· 阅读需 3 分钟
Guangbo Chen
Founder of 1BLOCK.AI

🚀 Introducing LLMOS v0.2

LLMOS is a cloud-native tool designed to accelerate AI application development and simplify the management of large language models (LLMs). It supports deployment on both public clouds and private GPU servers, enabling you to easily deploy private AI models, scale machine learning workflows, and reduce the complexity of development and operations.

With the increasing demand for GPU virtualization (vGPU) and resource utilization, the v0.2 release prioritizes features like vGPU management, cluster and GPU resource monitoring, and alerting to maximize GPU management efficiency and utilization.

🌟 Key Features

1. More Efficient GPU Management

Introducing support for NVIDIA Virtual GPU (vGPU), allowing you to choose between virtual or full GPUs based on your needs. This accelerates resource allocation and optimizes the utilization of GPU VRAM and CUDA cores.

GPU ModelSupportArchitecture
A100, A200NVIDIA Ampere
H100, H200NVIDIA Hopper
Tesla T4/T4GNVIDIA Turing
30x/40x SeriesAda Lovelace/Ampere

Virtual GPU (vGPU): Optimize GPU utilization and scale workloads seamlessly.

model-service-create model-service-vram

GPU Management Interface: Intuitively view GPU details and monitor resources in real time.

gpu-device gpu-device-metrics

👉 Learn More

2. Real-Time Monitoring and Alerts

Enable GPU and cluster monitoring with a single click using preconfigured Grafana dashboards and Prometheus alerts. Track performance metrics in real time to ensure stable workload operations.

Real-Time Monitoring: Stay informed about cluster and GPU status.

cluster-gpu-metrics

Intelligent Alerts: Predefined rules to reduce risks of failures and downtime.

monitoring-rules

Pause and Resume Workloads: Release idle resources to enhance efficiency.

workload-actions

👉 Learn More

⚡ Key Enhancements

1. Faster Installation Experience

  • For CN users, you can accelerate installation with --mirror cn.

    curl -sfL https://get-llmos.1block.ai | sh -s - --cluster-init --token mytoken --mirror cn
  • For restricted network and air-gap environments, use configurations like globalSystemImageRegistry or registries to integrate private image registries, accelerating and simplifying the installation process.

2. Expanded Model Service Sources

Support loading AI models from HuggingFace, ModelScope, or local paths, offering more flexibility in model deployment for your projects.

model-service-sources

3. Optimized Workload Management

  • Automatic Volume Cleanup: Automatically release storage resources after workload deletion, simplifying management.
  • Notebook Optimization: Added support for Jupyter Pipeline images and i18n localization(e.g., Chinese) patches. notebook-pipeline
  • Node-Level GPU Metrics Optimization: Gain detailed overviews of GPU resources to enable fine-grained management. node-metrics
  • Enhanced Model Token Metrics: View real-time model token usage and response speeds to optimize resource planning and task execution of model services. token-metrics

🛠 Updates and Fixes

  • Dependency Updates: System dependencies have been updated to improve performance, security, and compatibility:
    • Rook Ceph and Ceph cluster upgraded to v1.15.7.
    • Snapshot Controller upgraded to v8.2.0.
    • Upgrade Controller upgraded to v0.14.2.
  • Key Bug Fixes:
    • Model Service Parameter Issues: Fixed to allow seamless parameter updates.
    • Label Nil Exception: Custom addon will no longer experience label nil exception.
    • User Permission Optimization: Removed unnecessary node permissions for regular users, enhancing security.

🌐 Ready to Experience?

Visit the documentation to learn more. Upgrade to LLMOS v0.2 today and experience the future of AI management!

🚀 Upgrade Now!

Introducing LLMOS

· 阅读需 5 分钟
Guangbo Chen
Founder of 1BLOCK.AI

An Open-source Cloud-native AI Infrastructure Platform, Not Just GPUs

What is LLMOS?

We are thrilled to announce the launch of LLMOS, an open-source cloud-native AI infrastructure platform designed to simplify the management of AI applications and Large Language Models (LLMs). With LLMOS, organizations can effortlessly deploy, scale, and operate machine learning workflows while reducing the complexity often associated with AI development and operations.

Why We Built LLMOS

AI and LLMs are transforming industries, but managing the infrastructure needed for AI at scale can be challenging. We built LLMOS to break down these barriers, providing a platform that makes it easier for developers, data scientists, and IT teams to focus on what really matters—building and deploying powerful AI solutions. With its cloud-native foundation, LLMOS integrates smoothly with existing infrastructure, offering a flexible, scalable, and user-friendly way to manage AI projects and tasks.

Key Features of LLMOS

1. Seamless Notebook Integration

LLMOS integrates with popular notebook environments such as Jupyter, VSCode, and RStudio, enabling data scientists and developers to work efficiently in familiar tools without complicated setup.

jupyter-notebook

2. ModelService for LLM Deployment

Deploying LLMs is now simpler with ModelService, which provides OpenAI-compatible APIs for serving large language models. This feature makes it easy to deploy, scale, and use LLMs in real-world applications.

model-service

3. Machine Learning Cluster

The Machine Learning Cluster supports distributed computing, offering parallel processing and access to leading AI libraries. This feature enhances the performance of machine learning workflows, especially for large-scale models and datasets.

machine-learning-cluster

4. Scalable Storage with Rook Ceph

Rook Ceph provides distributed and fault-tolerant storage system for LLMOS, offering robust, scalable block and filesystem storage that adapts to the needs of AI and LLM applications.

roo-ceph

5. Extensibility with Managed Addons

LLMOS introduces ManagedAddon support, allowing users to extend the platform with system and custom add-ons. This gives organizations more flexibility to tailor the platform to their specific needs.

6. Simplified User and API Key Management

The platform features an intuitive interface for managing users and API keys, making access control and resource allocation easier for administrators.

api-keys

7. Role-Based Access Control (RBAC) and Role Templates

LLMOS offers enhanced Role Templates and RBAC, helping administrators assign permissions and manage security across teams and projects with ease.

role-templates

8. Node Management

Node Management is available directly through the LLMOS dashboard, allowing for better visibility and control over system resources, enhancing operational efficiency.

nodes node-management

9. Bootstrap and Installation Support

Setting up LLMOS has been simplified through easy-to-use installation script and comprehensive bootstrap configurations, making it easy for users to get up and running.

10. Easy Upgrades

With streamlined upgrade capabilities, LLMOS ensures that you can quickly adopt new features and improvements with minimal disruption.

LLMOS Use Cases

  • AI Research & Development: Simplify the management of LLMs and AI infrastructure, allowing researchers to focus on innovation rather than operational overhead.
  • Enterprise AI Solutions: Streamline the deployment of AI applications with scalable infrastructure, making it easier to manage models, storage, and resources across multiple teams.
  • Data Science Workflows: With notebook integration and powerful cluster computing, LLMOS is ideal for data scientists looking to run complex experiments at scale.
  • AI-Driven Products: From chatbots to automated content generation, LLMOS simplifies the process of deploying LLM-based products that can serve millions of users and scale up horizontally.

Getting Started with LLMOS

Ready to get started with LLMOS? Our detailed documentation covers everything from installation to advanced features. Whether you’re a developer, data scientist, or system administrator, you’ll find LLMOS easy to set up and use, below is the quick-start guideline.

备注

Make sure your nodes meet the requirements before proceeding.

Installation Script

LLMOS can be installed to a bare-metal server or a virtual machine. To bootstrap a new cluster, follow the steps below:

curl -sfL https://get-llmos.1block.ai | sh -s - --cluster-init --token mytoken

To monitor installation logs, run journalctl -u llmos -f.

If your environment requires internet access through a proxy, set the HTTP_PROXY and HTTPS_PROXY environment variables before running the installation script:

export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080
export NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16 # Replace the CIDRs with your own
Getting Started

After installing LLMOS, access the dashboard by navigating to https://<server-ip>:8443 in your web browser.

  1. LLMOS will create a default admin user with a randomly generated password. To retrieve the password, run the following command on the cluster-init node:

    kubectl get secret --namespace llmos-system llmos-bootstrap-passwd -o go-template='{{.data.password|base64decode}}{{"\n"}}'

    welcome-login

  2. Upon logging in, you will be redirected to the setup page. Configure the following:

  • Set a new password for the admin user (strong passwords are recommended).
  • Configure the server URL that all other nodes in your cluster will use to connect. welcome-config
  1. After setup, you will be redirected to the home page where you can start using LLMOS. home-page

More Examples

To learn more about using LLMOS, explore the following resources:

Join Us

We are excited to build a community around the project. If you're interested, please join us on Discord or participate in Github Discussions to discuss or contribute the project. If you need to contact us, please reach out to us via here. We look forward to collaborating with you, thanks!