Skip to main content

LLMOS v0.2 - Simplify AI Management, Unlock GPU Potential

ยท 3 min read
Guangbo Chen
Founder of 1BLOCK.AI

๐Ÿš€ Introducing LLMOS v0.2โ€‹

LLMOS is a cloud-native tool designed to accelerate AI application development and simplify the management of large language models (LLMs). It supports deployment on both public clouds and private GPU servers, enabling you to easily deploy private AI models, scale machine learning workflows, and reduce the complexity of development and operations.

With the increasing demand for GPU virtualization (vGPU) and resource utilization, the v0.2 release prioritizes features like vGPU management, cluster and GPU resource monitoring, and alerting to maximize GPU management efficiency and utilization.

๐ŸŒŸ Key Featuresโ€‹

1. More Efficient GPU Managementโ€‹

Introducing support for NVIDIA Virtual GPU (vGPU), allowing you to choose between virtual or full GPUs based on your needs. This accelerates resource allocation and optimizes the utilization of GPU VRAM and CUDA cores.

GPU ModelSupportArchitecture
A100, A200โœ…NVIDIA Ampere
H100, H200โœ…NVIDIA Hopper
Tesla T4/T4Gโœ…NVIDIA Turing
30x/40x Seriesโœ…Ada Lovelace/Ampere

โœ” Virtual GPU (vGPU): Optimize GPU utilization and scale workloads seamlessly.

model-service-create model-service-vram

โœ” GPU Management Interface: Intuitively view GPU details and monitor resources in real time.

gpu-device gpu-device-metrics

๐Ÿ‘‰ Learn More

2. Real-Time Monitoring and Alertsโ€‹

Enable GPU and cluster monitoring with a single click using preconfigured Grafana dashboards and Prometheus alerts. Track performance metrics in real time to ensure stable workload operations.

โœ” Real-Time Monitoring: Stay informed about cluster and GPU status.

cluster-gpu-metrics

โœ” Intelligent Alerts: Predefined rules to reduce risks of failures and downtime.

monitoring-rules

โœ” Pause and Resume Workloads: Release idle resources to enhance efficiency.

workload-actions

๐Ÿ‘‰ Learn More

โšก Key Enhancementsโ€‹

1. Faster Installation Experienceโ€‹

  • For CN users, you can accelerate installation with --mirror cn.

    curl -sfL https://get-llmos.1block.ai | sh -s - --cluster-init --token mytoken --mirror cn
  • For restricted network and air-gap environments, use configurations like globalSystemImageRegistry or registries to integrate private image registries, accelerating and simplifying the installation process.

2. Expanded Model Service Sourcesโ€‹

Support loading AI models from HuggingFace, ModelScope, or local paths, offering more flexibility in model deployment for your projects.

model-service-sources

3. Optimized Workload Managementโ€‹

  • Automatic Volume Cleanup: Automatically release storage resources after workload deletion, simplifying management.
  • Notebook Optimization: Added support for Jupyter Pipeline images and i18n localization(e.g., Chinese) patches. notebook-pipeline
  • Node-Level GPU Metrics Optimization: Gain detailed overviews of GPU resources to enable fine-grained management. node-metrics
  • Enhanced Model Token Metrics: View real-time model token usage and response speeds to optimize resource planning and task execution of model services. token-metrics

๐Ÿ›  Updates and Fixesโ€‹

  • Dependency Updates: System dependencies have been updated to improve performance, security, and compatibility:
    • Rook Ceph and Ceph cluster upgraded to v1.15.7.
    • Snapshot Controller upgraded to v8.2.0.
    • Upgrade Controller upgraded to v0.14.2.
  • Key Bug Fixes:
    • Model Service Parameter Issues: Fixed to allow seamless parameter updates.
    • Label Nil Exception: Custom addon will no longer experience label nil exception.
    • User Permission Optimization: Removed unnecessary node permissions for regular users, enhancing security.

๐ŸŒ Ready to Experience?โ€‹

Visit the documentation to learn more. Upgrade to LLMOS v0.2 today and experience the future of AI management!

๐Ÿš€ Upgrade Now!

Introducing LLMOS

ยท 5 min read
Guangbo Chen
Founder of 1BLOCK.AI

An Open-source Cloud-native AI Infrastructure Platform, Not Just GPUsโ€‹

What is LLMOS?โ€‹

We are thrilled to announce the launch of LLMOS, an open-source cloud-native AI infrastructure platform designed to simplify the management of AI applications and Large Language Models (LLMs). With LLMOS, organizations can effortlessly deploy, scale, and operate machine learning workflows while reducing the complexity often associated with AI development and operations.

Why We Built LLMOSโ€‹

AI and LLMs are transforming industries, but managing the infrastructure needed for AI at scale can be challenging. We built LLMOS to break down these barriers, providing a platform that makes it easier for developers, data scientists, and IT teams to focus on what really mattersโ€”building and deploying powerful AI solutions. With its cloud-native foundation, LLMOS integrates smoothly with existing infrastructure, offering a flexible, scalable, and user-friendly way to manage AI projects and tasks.

Key Features of LLMOSโ€‹

1. Seamless Notebook Integrationโ€‹

LLMOS integrates with popular notebook environments such as Jupyter, VSCode, and RStudio, enabling data scientists and developers to work efficiently in familiar tools without complicated setup.

jupyter-notebook

2. ModelService for LLM Deploymentโ€‹

Deploying LLMs is now simpler with ModelService, which provides OpenAI-compatible APIs for serving large language models. This feature makes it easy to deploy, scale, and use LLMs in real-world applications.

model-service

3. Machine Learning Clusterโ€‹

The Machine Learning Cluster supports distributed computing, offering parallel processing and access to leading AI libraries. This feature enhances the performance of machine learning workflows, especially for large-scale models and datasets.

machine-learning-cluster

4. Scalable Storage with Rook Cephโ€‹

Rook Ceph provides distributed and fault-tolerant storage system for LLMOS, offering robust, scalable block and filesystem storage that adapts to the needs of AI and LLM applications.

roo-ceph

5. Extensibility with Managed Addonsโ€‹

LLMOS introduces ManagedAddon support, allowing users to extend the platform with system and custom add-ons. This gives organizations more flexibility to tailor the platform to their specific needs.

6. Simplified User and API Key Managementโ€‹

The platform features an intuitive interface for managing users and API keys, making access control and resource allocation easier for administrators.

api-keys

7. Role-Based Access Control (RBAC) and Role Templatesโ€‹

LLMOS offers enhanced Role Templates and RBAC, helping administrators assign permissions and manage security across teams and projects with ease.

role-templates

8. Node Managementโ€‹

Node Management is available directly through the LLMOS dashboard, allowing for better visibility and control over system resources, enhancing operational efficiency.

nodes node-management

9. Bootstrap and Installation Supportโ€‹

Setting up LLMOS has been simplified through easy-to-use installation script and comprehensive bootstrap configurations, making it easy for users to get up and running.

10. Easy Upgradesโ€‹

With streamlined upgrade capabilities, LLMOS ensures that you can quickly adopt new features and improvements with minimal disruption.

LLMOS Use Casesโ€‹

  • AI Research & Development: Simplify the management of LLMs and AI infrastructure, allowing researchers to focus on innovation rather than operational overhead.
  • Enterprise AI Solutions: Streamline the deployment of AI applications with scalable infrastructure, making it easier to manage models, storage, and resources across multiple teams.
  • Data Science Workflows: With notebook integration and powerful cluster computing, LLMOS is ideal for data scientists looking to run complex experiments at scale.
  • AI-Driven Products: From chatbots to automated content generation, LLMOS simplifies the process of deploying LLM-based products that can serve millions of users and scale up horizontally.

Getting Started with LLMOSโ€‹

Ready to get started with LLMOS? Our detailed documentation covers everything from installation to advanced features. Whether youโ€™re a developer, data scientist, or system administrator, youโ€™ll find LLMOS easy to set up and use, below is the quick-start guideline.

note

Make sure your nodes meet the requirements before proceeding.

Installation Scriptโ€‹

LLMOS can be installed to a bare-metal server or a virtual machine. To bootstrap a new cluster, follow the steps below:

curl -sfL https://get-llmos.1block.ai | sh -s - --cluster-init --token mytoken

To monitor installation logs, run journalctl -u llmos -f.

If your environment requires internet access through a proxy, set the HTTP_PROXY and HTTPS_PROXY environment variables before running the installation script:

export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080
export NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16 # Replace the CIDRs with your own
Getting Startedโ€‹

After installing LLMOS, access the dashboard by navigating to https://<server-ip>:8443 in your web browser.

  1. LLMOS will create a default admin user with a randomly generated password. To retrieve the password, run the following command on the cluster-init node:

    kubectl get secret --namespace llmos-system llmos-bootstrap-passwd -o go-template='{{.data.password|base64decode}}{{"\n"}}'

    welcome-login

  2. Upon logging in, you will be redirected to the setup page. Configure the following:

  • Set a new password for the admin user (strong passwords are recommended).
  • Configure the server URL that all other nodes in your cluster will use to connect. welcome-config
  1. After setup, you will be redirected to the home page where you can start using LLMOS. home-page

More Examplesโ€‹

To learn more about using LLMOS, explore the following resources:

Join Usโ€‹

We are excited to build a community around the project. If you're interested, please join us on Discord or participate in Github Discussions to discuss or contribute the project. If you need to contact us, please reach out to us via here. We look forward to collaborating with you, thanks!

Hello World @1Block.AI

ยท 3 min read
Guangbo Chen
Founder of 1BLOCK.AI
func main() {
fmt.Println("Hello World, @1Block.AI");
}

Why We Build 1Block.AIโ€‹

We believe that AGI (Artificial General Intelligence) is the next significant milestone in the human history. It will not only reshape how we live, work, and play but also revolutionize how we develop software. We are building 1Block.AI to assist both developers and non-developers in unlocking the power of LLMs, allowing them to construct their own generative AI applications using a single, unified management platform.

What is 1Block.AIโ€‹

1Block.AI is an open-source, cloud-native LLMOps platform that fosters innovation in LLMs and generative AI applications. It is built on top of cutting-edge technologies such as Kubernetes, Ray.io, vLLM, etc., and designed to be cloud-agnostic and ML framework agnostic.

Projects like Ray(also KubeRay), and LangChain are excellent open-source projects for ML lifecycle management and can be served as the core components of the LLMOps. For instance, Ray offers powerful distributed computing capabilities and a comprehensive ML computing framework, forming a powerful ML foundation for the platform. However, they lack a unified solution for cluster management, multi-tenancy, cost control, data privacy protection, resource versioning, etc. These aspects need addressing in other components of the LLMOps platform. In essence, we believe a user-friendly LLMOps platform should encompass:

  • Cost-effectiveness and Data Privacy: The platform must be open-source, feature a distributed architecture, and support private deployment.
  • Easy to Use: Provide a unified interface for developers and non-developers to implement complete life cycle management of Large Language Models(LLM) and generative AI applications.
    • Exploratory Data Analysis (EDA): Iteratively explore, share, and prepare data for the machine learning lifecycle by creating reproducible, editable, and shareable datasets, tables, and visual charts.
    • Model Registration and Management: Allow users to upload, track, and manage versions of models and associate them with specific datasets and hyperparameters.
    • Continuous Integration/Continuous Deployment (CI/CD): Ensure the continuous updating and deployment of models and AI agents, enabling them to respond promptly to new data and changes.
    • Performance Monitoring and Logging: Real-time monitoring of model performance, including metrics such as inference time, memory usage, and logging all interactions for auditing and fine-tuning.
    • Automated Tuning and Optimization: Automatically adjust and optimize models using tools to maintain their optimal performance in different environments.
  • No Vendor Lock-in: Compatible with different cloud infrastructures and different ML frameworks(cloud-agnostic & ML-agnostic).
  • Scalability and Portability: Supports serverless deployment, provides a unified solution for LLMs and generative AI applications to be deployed anywhere, from public cloud to on-premise servers.
  • Interoperability and maintainability: customizable and extendable with cloud-native and ML ecosystems.

1block-ai-architecture

Join Usโ€‹

We are excited to build a community around the project. If you're interested, please join us on Discord or participate in Github Discussions to discuss or contribute the project. We look forward to collaborating with you. Merci!