AI/ML ve DevOps Entegrasyonu: MLOps Pratikleri

MLOps (Machine Learning Operations), makine öğrenmesi modellerinin geliştirilmesi, deployment'ı ve yönetimi için DevOps pratiklerinin uygulanmasıdır.

MLOps Nedir?

MLOps, makine öğrenmesi projelerinde sürekli entegrasyon, sürekli deployment ve sürekli monitoring pratiklerini uygulayarak, ML modellerinin production ortamında güvenilir ve ölçeklenebilir şekilde çalışmasını sağlar.

MLOps'un Faydaları

  • Hızlı Deployment: Modellerin hızlı ve güvenli deployment'ı
  • Model Monitoring: Production'da model performansının takibi
  • Reproducibility: Deneylerin tekrarlanabilirliği
  • Collaboration: Veri bilimcileri ve mühendisler arası işbirliği

MLOps Pipeline

1. Data Pipeline

# Data ingestion pipeline
import pandas as pd
from sklearn.model_selection import train_test_split
import mlflow

def load_and_preprocess_data():
    """Veri yükleme ve ön işleme"""
    # Veri yükleme
    df = pd.read_csv('data/raw/customer_data.csv')
    
    # Veri temizleme
    df = df.dropna()
    df = df.drop_duplicates()
    
    # Feature engineering
    df['age_group'] = pd.cut(df['age'], bins=[0, 25, 35, 50, 100], 
                            labels=['Genç', 'Orta', 'Yetişkin', 'Yaşlı'])
    
    return df

def split_data(df):
    """Veriyi train/test olarak böl"""
    X = df.drop('target', axis=1)
    y = df['target']
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    
    return X_train, X_test, y_train, y_test

2. Model Training Pipeline

# Model training with MLflow
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

def train_model(X_train, y_train, X_test, y_test):
    """Model eğitimi ve değerlendirme"""
    
    # MLflow experiment başlat
    mlflow.set_experiment("customer-churn-prediction")
    
    with mlflow.start_run():
        # Model parametreleri
        params = {
            'n_estimators': 100,
            'max_depth': 10,
            'random_state': 42
        }
        
        # Model eğitimi
        model = RandomForestClassifier(**params)
        model.fit(X_train, y_train)
        
        # Tahminler
        y_pred = model.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        
        # MLflow'a log
        mlflow.log_params(params)
        mlflow.log_metric("accuracy", accuracy)
        mlflow.log_metric("precision", precision_score(y_test, y_pred, average='weighted'))
        mlflow.log_metric("recall", recall_score(y_test, y_pred, average='weighted'))
        
        # Model kaydet
        mlflow.sklearn.log_model(model, "model")
        
        return model, accuracy

3. Model Registry

# Model registry yönetimi
import mlflow
from mlflow.tracking import MlflowClient

def register_model(run_id, model_name):
    """Model'i registry'ye kaydet"""
    client = MlflowClient()
    
    # Model'i registry'ye kaydet
    model_uri = f"runs:/{run_id}/model"
    model_details = mlflow.register_model(model_uri, model_name)
    
    # Production'a promote et
    client.transition_model_version_stage(
        name=model_name,
        version=model_details.version,
        stage="Production"
    )
    
    return model_details

Containerization

Dockerfile for ML Model

# ML model için Dockerfile
FROM python:3.9-slim

WORKDIR /app

# Gerekli paketleri kopyala
COPY requirements.txt .
RUN pip install -r requirements.txt

# Model dosyalarını kopyala
COPY models/ ./models/
COPY app.py .

# Environment variables
ENV MODEL_PATH=/app/models/model.pkl
ENV PORT=8000

EXPOSE 8000

CMD ["python", "app.py"]

Model Serving Application

# Flask app for model serving
from flask import Flask, request, jsonify
import pickle
import pandas as pd
import os

app = Flask(__name__)

# Model yükle
model_path = os.getenv('MODEL_PATH', '/app/models/model.pkl')
with open(model_path, 'rb') as f:
    model = pickle.load(f)

@app.route('/predict', methods=['POST'])
def predict():
    """Tahmin endpoint'i"""
    try:
        data = request.get_json()
        features = pd.DataFrame(data)
        
        # Tahmin yap
        prediction = model.predict(features)
        probability = model.predict_proba(features)
        
        return jsonify({
            'prediction': prediction.tolist(),
            'probability': probability.tolist(),
            'status': 'success'
        })
    except Exception as e:
        return jsonify({
            'error': str(e),
            'status': 'error'
        }), 400

@app.route('/health', methods=['GET'])
def health():
    """Health check endpoint"""
    return jsonify({'status': 'healthy'})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=int(os.getenv('PORT', 8000)))

Kubernetes Deployment

ML Model Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      containers:
      - name: ml-model
        image: ml-model:latest
        ports:
        - containerPort: 8000
        env:
        - name: MODEL_PATH
          value: "/app/models/model.pkl"
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5

Model Service

apiVersion: v1
kind: Service
metadata:
  name: ml-model-service
spec:
  selector:
    app: ml-model
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000
  type: LoadBalancer

CI/CD Pipeline for ML

GitHub Actions for MLOps

name: MLOps Pipeline

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  data-validation:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Setup Python
      uses: actions/setup-python@v3
      with:
        python-version: '3.9'
    
    - name: Install dependencies
      run: |
        pip install -r requirements.txt
    
    - name: Run data validation
      run: |
        python scripts/validate_data.py
    
    - name: Run data tests
      run: |
        python -m pytest tests/test_data.py

  model-training:
    needs: data-validation
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Setup Python
      uses: actions/setup-python@v3
      with:
        python-version: '3.9'
    
    - name: Install dependencies
      run: |
        pip install -r requirements.txt
    
    - name: Set up MLflow
      run: |
        mlflow server --host 0.0.0.0 --port 5000 &
    
    - name: Train model
      run: |
        python scripts/train_model.py
    
    - name: Evaluate model
      run: |
        python scripts/evaluate_model.py

  model-deployment:
    needs: model-training
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
    - uses: actions/checkout@v3
    
    - name: Build Docker image
      run: |
        docker build -t ml-model:${{ github.sha }} .
    
    - name: Push to registry
      run: |
        echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
        docker push ml-model:${{ github.sha }}
    
    - name: Deploy to Kubernetes
      run: |
        kubectl set image deployment/ml-model-deployment ml-model=ml-model:${{ github.sha }}

Model Monitoring

Model Performance Monitoring

# Model monitoring script
import mlflow
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def monitor_model_performance():
    """Model performansını izle"""
    
    # Production verilerini al
    production_data = pd.read_csv('data/production/predictions.csv')
    
    # Gerçek değerler vs tahminler
    y_true = production_data['actual']
    y_pred = production_data['predicted']
    
    # Metrikleri hesapla
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, average='weighted')
    recall = recall_score(y_true, y_pred, average='weighted')
    
    # MLflow'a log
    with mlflow.start_run():
        mlflow.log_metric("production_accuracy", accuracy)
        mlflow.log_metric("production_precision", precision)
        mlflow.log_metric("production_recall", recall)
    
    # Alert thresholds
    if accuracy < 0.8:
        logger.warning(f"Model accuracy below threshold: {accuracy}")
        # Send alert
    
    return accuracy, precision, recall

def detect_data_drift():
    """Veri drift'ini tespit et"""
    from scipy import stats
    
    # Training data
    train_data = pd.read_csv('data/train/train_data.csv')
    
    # Production data
    prod_data = pd.read_csv('data/production/production_data.csv')
    
    # Kolmogorov-Smirnov test
    for column in train_data.select_dtypes(include=[np.number]).columns:
        ks_stat, p_value = stats.ks_2samp(
            train_data[column], 
            prod_data[column]
        )
        
        if p_value < 0.05:
            logger.warning(f"Data drift detected in column: {column}")

Prometheus Metrics

# Prometheus metrics for ML model
from prometheus_client import Counter, Histogram, Gauge
import time

# Metrics
PREDICTION_COUNTER = Counter('ml_predictions_total', 'Total predictions made')
PREDICTION_DURATION = Histogram('ml_prediction_duration_seconds', 'Prediction duration')
MODEL_ACCURACY = Gauge('ml_model_accuracy', 'Current model accuracy')

@app.route('/predict', methods=['POST'])
def predict():
    """Tahmin endpoint'i with metrics"""
    start_time = time.time()
    
    try:
        data = request.get_json()
        features = pd.DataFrame(data)
        
        # Tahmin yap
        prediction = model.predict(features)
        
        # Metrics güncelle
        PREDICTION_COUNTER.inc()
        PREDICTION_DURATION.observe(time.time() - start_time)
        
        return jsonify({
            'prediction': prediction.tolist(),
            'status': 'success'
        })
    except Exception as e:
        return jsonify({
            'error': str(e),
            'status': 'error'
        }), 400

@app.route('/metrics', methods=['GET'])
def metrics():
    """Prometheus metrics endpoint"""
    return generate_latest()

A/B Testing

Model A/B Testing

# A/B testing for ML models
import random
from flask import request

def get_model_version():
    """A/B testing için model versiyonu seç"""
    user_id = request.headers.get('User-ID')
    
    # Consistent hashing for user assignment
    hash_value = hash(user_id) % 100
    
    if hash_value < 50:
        return 'model-v1'  # Control group
    else:
        return 'model-v2'  # Treatment group

@app.route('/predict', methods=['POST'])
def predict_ab():
    """A/B testing ile tahmin"""
    model_version = get_model_version()
    
    # Model versiyonuna göre tahmin yap
    if model_version == 'model-v1':
        prediction = model_v1.predict(features)
    else:
        prediction = model_v2.predict(features)
    
    # A/B testing metrics
    AB_TEST_COUNTER.labels(model_version=model_version).inc()
    
    return jsonify({
        'prediction': prediction.tolist(),
        'model_version': model_version,
        'status': 'success'
    })

Sonuç

MLOps, makine öğrenmesi projelerinin production ortamında başarılı bir şekilde çalışması için kritik öneme sahiptir. Doğru araçlar ve pratiklerle, ML modellerinizi güvenilir, ölçeklenebilir ve sürdürülebilir şekilde yönetebilirsiniz.

Önerilen Araçlar

  • Experiment Tracking: MLflow, Weights & Biases, Neptune
  • Model Registry: MLflow Model Registry, AWS SageMaker Model Registry
  • Feature Store: Feast, Tecton, Hopsworks
  • Model Serving: TensorFlow Serving, TorchServe, Seldon Core
  • Monitoring: Prometheus, Grafana, Evidently AI
  • Orchestration: Apache Airflow, Kubeflow, Argo Workflows

Bu Yazıyı Paylaş: