AI/ML ve DevOps Entegrasyonu: MLOps Pratikleri
MLOps (Machine Learning Operations), makine öğrenmesi modellerinin geliştirilmesi, deployment'ı ve yönetimi için DevOps pratiklerinin uygulanmasıdır.
MLOps Nedir?
MLOps, makine öğrenmesi projelerinde sürekli entegrasyon, sürekli deployment ve sürekli monitoring pratiklerini uygulayarak, ML modellerinin production ortamında güvenilir ve ölçeklenebilir şekilde çalışmasını sağlar.
MLOps'un Faydaları
- Hızlı Deployment: Modellerin hızlı ve güvenli deployment'ı
- Model Monitoring: Production'da model performansının takibi
- Reproducibility: Deneylerin tekrarlanabilirliği
- Collaboration: Veri bilimcileri ve mühendisler arası işbirliği
MLOps Pipeline
1. Data Pipeline
# Data ingestion pipeline
import pandas as pd
from sklearn.model_selection import train_test_split
import mlflow
def load_and_preprocess_data():
"""Veri yükleme ve ön işleme"""
# Veri yükleme
df = pd.read_csv('data/raw/customer_data.csv')
# Veri temizleme
df = df.dropna()
df = df.drop_duplicates()
# Feature engineering
df['age_group'] = pd.cut(df['age'], bins=[0, 25, 35, 50, 100],
labels=['Genç', 'Orta', 'Yetişkin', 'Yaşlı'])
return df
def split_data(df):
"""Veriyi train/test olarak böl"""
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
return X_train, X_test, y_train, y_test
2. Model Training Pipeline
# Model training with MLflow
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
def train_model(X_train, y_train, X_test, y_test):
"""Model eğitimi ve değerlendirme"""
# MLflow experiment başlat
mlflow.set_experiment("customer-churn-prediction")
with mlflow.start_run():
# Model parametreleri
params = {
'n_estimators': 100,
'max_depth': 10,
'random_state': 42
}
# Model eğitimi
model = RandomForestClassifier(**params)
model.fit(X_train, y_train)
# Tahminler
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
# MLflow'a log
mlflow.log_params(params)
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("precision", precision_score(y_test, y_pred, average='weighted'))
mlflow.log_metric("recall", recall_score(y_test, y_pred, average='weighted'))
# Model kaydet
mlflow.sklearn.log_model(model, "model")
return model, accuracy
3. Model Registry
# Model registry yönetimi
import mlflow
from mlflow.tracking import MlflowClient
def register_model(run_id, model_name):
"""Model'i registry'ye kaydet"""
client = MlflowClient()
# Model'i registry'ye kaydet
model_uri = f"runs:/{run_id}/model"
model_details = mlflow.register_model(model_uri, model_name)
# Production'a promote et
client.transition_model_version_stage(
name=model_name,
version=model_details.version,
stage="Production"
)
return model_details
Containerization
Dockerfile for ML Model
# ML model için Dockerfile
FROM python:3.9-slim
WORKDIR /app
# Gerekli paketleri kopyala
COPY requirements.txt .
RUN pip install -r requirements.txt
# Model dosyalarını kopyala
COPY models/ ./models/
COPY app.py .
# Environment variables
ENV MODEL_PATH=/app/models/model.pkl
ENV PORT=8000
EXPOSE 8000
CMD ["python", "app.py"]
Model Serving Application
# Flask app for model serving
from flask import Flask, request, jsonify
import pickle
import pandas as pd
import os
app = Flask(__name__)
# Model yükle
model_path = os.getenv('MODEL_PATH', '/app/models/model.pkl')
with open(model_path, 'rb') as f:
model = pickle.load(f)
@app.route('/predict', methods=['POST'])
def predict():
"""Tahmin endpoint'i"""
try:
data = request.get_json()
features = pd.DataFrame(data)
# Tahmin yap
prediction = model.predict(features)
probability = model.predict_proba(features)
return jsonify({
'prediction': prediction.tolist(),
'probability': probability.tolist(),
'status': 'success'
})
except Exception as e:
return jsonify({
'error': str(e),
'status': 'error'
}), 400
@app.route('/health', methods=['GET'])
def health():
"""Health check endpoint"""
return jsonify({'status': 'healthy'})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=int(os.getenv('PORT', 8000)))
Kubernetes Deployment
ML Model Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-model-deployment
spec:
replicas: 3
selector:
matchLabels:
app: ml-model
template:
metadata:
labels:
app: ml-model
spec:
containers:
- name: ml-model
image: ml-model:latest
ports:
- containerPort: 8000
env:
- name: MODEL_PATH
value: "/app/models/model.pkl"
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
Model Service
apiVersion: v1
kind: Service
metadata:
name: ml-model-service
spec:
selector:
app: ml-model
ports:
- protocol: TCP
port: 80
targetPort: 8000
type: LoadBalancer
CI/CD Pipeline for ML
GitHub Actions for MLOps
name: MLOps Pipeline
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
data-validation:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v3
with:
python-version: '3.9'
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Run data validation
run: |
python scripts/validate_data.py
- name: Run data tests
run: |
python -m pytest tests/test_data.py
model-training:
needs: data-validation
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v3
with:
python-version: '3.9'
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Set up MLflow
run: |
mlflow server --host 0.0.0.0 --port 5000 &
- name: Train model
run: |
python scripts/train_model.py
- name: Evaluate model
run: |
python scripts/evaluate_model.py
model-deployment:
needs: model-training
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v3
- name: Build Docker image
run: |
docker build -t ml-model:${{ github.sha }} .
- name: Push to registry
run: |
echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
docker push ml-model:${{ github.sha }}
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/ml-model-deployment ml-model=ml-model:${{ github.sha }}
Model Monitoring
Model Performance Monitoring
# Model monitoring script
import mlflow
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def monitor_model_performance():
"""Model performansını izle"""
# Production verilerini al
production_data = pd.read_csv('data/production/predictions.csv')
# Gerçek değerler vs tahminler
y_true = production_data['actual']
y_pred = production_data['predicted']
# Metrikleri hesapla
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred, average='weighted')
recall = recall_score(y_true, y_pred, average='weighted')
# MLflow'a log
with mlflow.start_run():
mlflow.log_metric("production_accuracy", accuracy)
mlflow.log_metric("production_precision", precision)
mlflow.log_metric("production_recall", recall)
# Alert thresholds
if accuracy < 0.8:
logger.warning(f"Model accuracy below threshold: {accuracy}")
# Send alert
return accuracy, precision, recall
def detect_data_drift():
"""Veri drift'ini tespit et"""
from scipy import stats
# Training data
train_data = pd.read_csv('data/train/train_data.csv')
# Production data
prod_data = pd.read_csv('data/production/production_data.csv')
# Kolmogorov-Smirnov test
for column in train_data.select_dtypes(include=[np.number]).columns:
ks_stat, p_value = stats.ks_2samp(
train_data[column],
prod_data[column]
)
if p_value < 0.05:
logger.warning(f"Data drift detected in column: {column}")
Prometheus Metrics
# Prometheus metrics for ML model
from prometheus_client import Counter, Histogram, Gauge
import time
# Metrics
PREDICTION_COUNTER = Counter('ml_predictions_total', 'Total predictions made')
PREDICTION_DURATION = Histogram('ml_prediction_duration_seconds', 'Prediction duration')
MODEL_ACCURACY = Gauge('ml_model_accuracy', 'Current model accuracy')
@app.route('/predict', methods=['POST'])
def predict():
"""Tahmin endpoint'i with metrics"""
start_time = time.time()
try:
data = request.get_json()
features = pd.DataFrame(data)
# Tahmin yap
prediction = model.predict(features)
# Metrics güncelle
PREDICTION_COUNTER.inc()
PREDICTION_DURATION.observe(time.time() - start_time)
return jsonify({
'prediction': prediction.tolist(),
'status': 'success'
})
except Exception as e:
return jsonify({
'error': str(e),
'status': 'error'
}), 400
@app.route('/metrics', methods=['GET'])
def metrics():
"""Prometheus metrics endpoint"""
return generate_latest()
A/B Testing
Model A/B Testing
# A/B testing for ML models
import random
from flask import request
def get_model_version():
"""A/B testing için model versiyonu seç"""
user_id = request.headers.get('User-ID')
# Consistent hashing for user assignment
hash_value = hash(user_id) % 100
if hash_value < 50:
return 'model-v1' # Control group
else:
return 'model-v2' # Treatment group
@app.route('/predict', methods=['POST'])
def predict_ab():
"""A/B testing ile tahmin"""
model_version = get_model_version()
# Model versiyonuna göre tahmin yap
if model_version == 'model-v1':
prediction = model_v1.predict(features)
else:
prediction = model_v2.predict(features)
# A/B testing metrics
AB_TEST_COUNTER.labels(model_version=model_version).inc()
return jsonify({
'prediction': prediction.tolist(),
'model_version': model_version,
'status': 'success'
})
Sonuç
MLOps, makine öğrenmesi projelerinin production ortamında başarılı bir şekilde çalışması için kritik öneme sahiptir. Doğru araçlar ve pratiklerle, ML modellerinizi güvenilir, ölçeklenebilir ve sürdürülebilir şekilde yönetebilirsiniz.
Önerilen Araçlar
- Experiment Tracking: MLflow, Weights & Biases, Neptune
- Model Registry: MLflow Model Registry, AWS SageMaker Model Registry
- Feature Store: Feast, Tecton, Hopsworks
- Model Serving: TensorFlow Serving, TorchServe, Seldon Core
- Monitoring: Prometheus, Grafana, Evidently AI
- Orchestration: Apache Airflow, Kubeflow, Argo Workflows