mlops笔记

Project 1

project setup

src,config, untils需要被视作为 package,因此在它们的folder里创建一个名为__init__.py的文件。

setup.py

setup.py 是一个用于配置 Python 项目分发的脚本。它使用 setuptools 库来定义项目的元数据和依赖项。

1
pip install -e .

执行setup.py文件,安装library

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from setuptools import setup,find_packages

with open("requirements.txt") as f:
    requirements = f.read().splitlines()

setup(
    name="MLOPS-PROJECT-1",
    version="0.1",
    author="Sudhanshu",
    packages=find_packages(),
    install_requires = requirements,
)

Data Ingestion

  1. 把 csv文件放在Google cloud上面
  2. 把账号,bucket相关配置放在config.yaml里面
  3. 定义read_yaml文件,把配置存在字典中,并返回字典
  4. 把配置字典作为参数传递,创建DataIngestion类对象,把 csv文件下载到本地,并且通过sklearn来划分train/test data

Data Preprocessing

  1. 定义DataProcessor类,从config.yaml里面读train_path, test_path, processed_dir
  2. 定义preprocess_data函数,洗数据(去掉重复行、从配置中获取分类列和数值列的名称、创建一个 LabelEncoder 对象,用于将分类数据转换为数值数据、从配置中获取偏度阈值,遍历偏度大于阈值的列应用对数变换,以减少偏度)

Training

  1. mlflow.start_run()
  2. mlflow.log_artifact(self.train_path , artifact_path=“datasets”)
  3. 使用RandomizedSearchCV进行超参数调优
  4. 把训练好的模型输出到文件里,mlflow.log_artifact(self.model_output_path)
  5. mlflow.log_params(best_lgbm_model.get_params())
  6. mlflow.log_metrics(metrics)

CI/CD

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

# Use the Jenkins image as the base image
FROM jenkins/jenkins:lts

# Switch to root user to install dependencies
USER root

# Install prerequisites and Docker
RUN apt-get update -y && \
    apt-get install -y apt-transport-https ca-certificates curl gnupg software-properties-common && \
    curl -fsSL https://download.docker.com/linux/debian/gpg | apt-key add - && \
    echo "deb [arch=arm64] https://download.docker.com/linux/debian bullseye stable" > /etc/apt/sources.list.d/docker.list && \
    apt-get update -y && \
    apt-get install -y docker-ce docker-ce-cli containerd.io && \
    apt-get clean

# Add Jenkins user to the Docker group (create if it doesn't exist)
RUN groupadd -f docker && \
    usermod -aG docker jenkins

# Create the Docker directory and volume for DinD
RUN mkdir -p /var/lib/docker
VOLUME /var/lib/docker

# Switch back to the Jenkins user
USER jenkins