Field	Value	Source
Canonical Path	/blog/ai-model-deployment-kubernetes-docker-mlops	Veni AI Blog
Primary Category	MLOps	Post Metadata
Author	Veni AI Technical Team	Post Metadata

Ανάπτυξη Μοντέλων AI: Kubernetes, Docker και MLOps Στρατηγικές

Η ανάπτυξη μοντέλων AI είναι η διαδικασία μεταφοράς ανεπτυγμένων μοντέλων σε ένα περιβάλλον παραγωγής με αξιόπιστο, επεκτάσιμο και εύκολα συντηρήσιμο τρόπο. Σε αυτόν τον οδηγό, εξετάζουμε σύγχρονες στρατηγικές ανάπτυξης.

Πρότυπα Ανάπτυξης

1. Batch Inference

Επεξεργασία δεδομένων σε παρτίδες, προγραμματισμένες εργασίες:

Data Lake → Batch Job → Model Inference → Results Storage

2. Real-time Inference

Άμεσες προβλέψεις μέσω API:

Request → API Gateway → Model Server → Response

3. Streaming Inference

Συνεχής επεξεργασία ροής δεδομένων:

Kafka Stream → Stream Processor → Model → Output Stream

4. Edge Deployment

Inference σε συσκευή:

Mobile/IoT Device → Optimized Model → Local Inference

Containerization Μοντέλου με Docker

Βασικό Dockerfile

1FROM python:3.11-slim
2
3WORKDIR /app
4
5# System dependencies
6RUN apt-get update && apt-get install -y \
7    libgomp1 \
8    && rm -rf /var/lib/apt/lists/*
9
10# Python dependencies
11COPY requirements.txt .
12RUN pip install --no-cache-dir -r requirements.txt
13
14# Model and code
15COPY model/ ./model/
16COPY src/ ./src/
17
18# Port
19EXPOSE 8000
20
21# Healthcheck
22HEALTHCHECK --interval=30s --timeout=10s \
23    CMD curl -f http://localhost:8000/health || exit 1
24
25# Start command
26CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]

Multi-stage Build

1# Build stage
2FROM python:3.11 AS builder
3WORKDIR /app
4COPY requirements.txt .
5RUN pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt
6
7# Production stage
8FROM python:3.11-slim
9WORKDIR /app
10COPY --from=builder /wheels /wheels
11RUN pip install --no-cache-dir /wheels/*
12COPY . .
13CMD ["python", "main.py"]

GPU Υποστήριξη

1FROM NVIDIA/cuda:12.1-runtime-ubuntu22.04
2
3ENV PYTHONDONTWRITEBYTECODE=1
4ENV PYTHONUNBUFFERED=1
5
6# Python installation
7RUN apt-get update && apt-get install -y python3 python3-pip
8
9# PyTorch GPU
10RUN pip3 install torch --index-url https://download.pytorch.org/whl/cu121
11
12COPY . /app
13WORKDIR /app
14CMD ["python3", "inference.py"]

Frameworks Παροχής Μοντέλων

FastAPI Server

1from fastapi import FastAPI, HTTPException
2from pydantic import BaseModel
3import torch
4
5app = FastAPI()
6
7# Load model (startup)
8model = None
9
10@app.on_event("startup")
11async def load_model():
12    global model
13    model = torch.load("model.pt")
14    model.eval()
15
16class PredictionRequest(BaseModel):
17    text: str
18
19class PredictionResponse(BaseModel):
20    prediction: str
21    confidence: float
22
23@app.post("/predict", response_model=PredictionResponse)
24async def predict(request: PredictionRequest):
25    if model is None:
26        raise HTTPException(500, "Model not loaded")
27    
28    with torch.no_grad():
29        output = model(request.text)
30    
31    return PredictionResponse(
32        prediction=output["label"],
33        confidence=output["score"]
34    )
35
36@app.get("/health")
37async def health():
38    return {"status": "healthy", "model_loaded": model is not None}

TorchServe

1# Create model archive
2torch-model-archiver \
3    --model-name mymodel \
4    --version 1.0 \
5    --model-file model.py \
6    --serialized-file model.pt \
7    --handler handler.py
8
9# Start serving
10torchserve --start \
11    --model-store model_store \
12    --models mymodel=mymodel.mar

Triton Inference Server

1# config.pbtxt
2name: "text_classifier"
3platform: "pytorch_libtorch"
4max_batch_size: 32
5input [
6  {
7    name: "INPUT__0"
8    data_type: TYPE_INT64
9    dims: [ -1 ]
10  }
11]
12output [
13  {
14    name: "OUTPUT__0"
15    data_type: TYPE_FP32
16    dims: [ -1, 2 ]
17  }
18]
19instance_group [
20  { count: 2, kind: KIND_GPU }
21]
22## Ανάπτυξη σε Kubernetes
23
24### Βασική Ανάπτυξη
25
26```yaml
27apiVersion: apps/v1
28kind: Deployment
29metadata:
30  name: model-server
31spec:
32  replicas: 3
33  selector:
34    matchLabels:
35      app: model-server
36  template:
37    metadata:
38      labels:
39        app: model-server
40    spec:
41      containers:
42      - name: model-server
43        image: myregistry/model-server:v1.0
44        ports:
45        - containerPort: 8000
46        resources:
47          requests:
48            memory: "2Gi"
49            cpu: "1"
50          limits:
51            memory: "4Gi"
52            cpu: "2"
53        livenessProbe:
54          httpGet:
55            path: /health
56            port: 8000
57          initialDelaySeconds: 30
58          periodSeconds: 10
59        readinessProbe:
60          httpGet:
61            path: /ready
62            port: 8000
63          initialDelaySeconds: 5
64          periodSeconds: 5

Ανάπτυξη με GPU

1apiVersion: apps/v1
2kind: Deployment
3metadata:
4  name: gpu-model-server
5spec:
6  replicas: 2
7  template:
8    spec:
9      containers:
10      - name: model
11        image: myregistry/gpu-model:v1.0
12        resources:
13          limits:
14            NVIDIA.com/gpu: 1
15      nodeSelector:
16        accelerator: NVIDIA-tesla-t4
17      tolerations:
18      - key: "NVIDIA.com/gpu"
19        operator: "Exists"
20        effect: "NoSchedule"

Horizontal Pod Autoscaler

1apiVersion: autoscaling/v2
2kind: HorizontalPodAutoscaler
3metadata:
4  name: model-server-hpa
5spec:
6  scaleTargetRef:
7    apiVersion: apps/v1
8    kind: Deployment
9    name: model-server
10  minReplicas: 2
11  maxReplicas: 10
12  metrics:
13  - type: Resource
14    resource:
15      name: cpu
16      target:
17        type: Utilization
18        averageUtilization: 70
19  - type: Pods
20    pods:
21      metric:
22        name: requests_per_second
23      target:
24        type: AverageValue
25        averageValue: 100

Service & Ingress

1apiVersion: v1
2kind: Service
3metadata:
4  name: model-service
5spec:
6  selector:
7    app: model-server
8  ports:
9  - port: 80
10    targetPort: 8000
11  type: ClusterIP
12---
13apiVersion: networking.k8s.io/v1
14kind: Ingress
15metadata:
16  name: model-ingress
17  annotations:
18    nginx.ingress.kubernetes.io/rate-limit: "100"
19spec:
20  rules:
21  - host: model.example.com
22    http:
23      paths:
24      - path: /
25        pathType: Prefix
26        backend:
27          service:
28            name: model-service
29            port:
30              number: 80

Pipeline MLOps

CI/CD Pipeline

1# .github/workflows/mlops.yml
2name: MLOps Pipeline
3
4on:
5  push:
6    branches: [main]
7
8jobs:
9  test:
10    runs-on: ubuntu-latest
11    steps:
12      - uses: actions/checkout@v3
13      - name: Run tests
14        run: pytest tests/
15
16  train:
17    needs: test
18    runs-on: ubuntu-latest
19    steps:
20      - name: Train model
21        run: python train.py
22      - name: Evaluate model
23        run: python evaluate.py
24      - name: Register model
25        if: success()
26        run: python register_model.py
27
28  deploy:
29    needs: train
30    runs-on: ubuntu-latest
31    steps:
32      - name: Build image
33        run: docker build -t model:${{ github.sha }} .
34      - name: Push to registry
35        run: docker push myregistry/model:${{ github.sha }}
36      - name: Deploy to K8s
37        run: kubectl set image deployment/model model=myregistry/model:${{ github.sha }}

Μητρώο Μοντέλων

1import mlflow
2
3# Registering model
4with mlflow.start_run():
5    mlflow.log_params({"learning_rate": 0.001, "epochs": 10})
6    mlflow.log_metrics({"accuracy": 0.95, "f1": 0.93})
7    mlflow.pytorch.log_model(model, "model")
8    
9# Loading model
10model_uri = "models:/text-classifier/production"
11model = mlflow.pytorch.load_model(model_uri)
12## Canary Deployment
13
14```yaml
15apiVersion: networking.istio.io/v1alpha3
16kind: VirtualService
17metadata:
18  name: model-service
19spec:
20  hosts:
21  - model-service
22  http:
23  - route:
24    - destination:
25        host: model-service-v1
26      weight: 90
27    - destination:
28        host: model-service-v2
29      weight: 10

Monitoring

Prometheus Metrics

1from prometheus_client import Counter, Histogram, start_http_server
2
3PREDICTIONS = Counter('predictions_total', 'Total predictions', ['model', 'status'])
4LATENCY = Histogram('prediction_latency_seconds', 'Prediction latency')
5
6@LATENCY.time()
7def predict(input_data):
8    result = model(input_data)
9    PREDICTIONS.labels(model='v1', status='success').inc()
10    return result

Grafana Dashboard

Βασικοί δείκτες προς παρακολούθηση:

Ρυθμός αιτημάτων (RPS)
Καθυστέρηση (p50, p95, p99)
Ποσοστό σφαλμάτων
Χρήση GPU
Χρήση μνήμης
Δείκτες μετατόπισης μοντέλου (model drift)

Conclusion

Η ανάπτυξη μοντέλων AI μπορεί να γίνει αξιόπιστη και επεκτάσιμη με σύγχρονες πρακτικές MLOps. Τα Docker, Kubernetes και οι CI/CD pipelines αποτελούν θεμελιώδη στοιχεία αυτής της διαδικασίας.

Στη Veni AI, προσφέρουμε λύσεις επιχειρησιακής ανάπτυξης AI. Επικοινωνήστε μαζί μας για τα έργα σας.

Ανέπτυξη Μοντέλων Τεχνητής Νοημοσύνης: Kubernetes, Docker και Στρατηγικές MLOps

Reference Overview