Project 1
project setup
src,config, untils需要被视作为 package,因此在它们的folder里创建一个名为__init__.py的文件。
setup.py
setup.py 是一个用于配置 Python 项目分发的脚本。它使用 setuptools 库来定义项目的元数据和依赖项。
执行setup.py文件,安装library
1
2
3
4
5
6
7
8
9
10
11
12
| from setuptools import setup,find_packages
with open("requirements.txt") as f:
requirements = f.read().splitlines()
setup(
name="MLOPS-PROJECT-1",
version="0.1",
author="Sudhanshu",
packages=find_packages(),
install_requires = requirements,
)
|
Data Ingestion
- 把 csv文件放在Google cloud上面
- 把账号,bucket相关配置放在config.yaml里面
- 定义read_yaml文件,把配置存在字典中,并返回字典
- 把配置字典作为参数传递,创建DataIngestion类对象,把 csv文件下载到本地,并且通过sklearn来划分train/test data
Data Preprocessing
- 定义DataProcessor类,从config.yaml里面读train_path, test_path, processed_dir
- 定义preprocess_data函数,洗数据(去掉重复行、从配置中获取分类列和数值列的名称、创建一个 LabelEncoder 对象,用于将分类数据转换为数值数据、从配置中获取偏度阈值,遍历偏度大于阈值的列应用对数变换,以减少偏度)
Training
- mlflow.start_run()
- mlflow.log_artifact(self.train_path , artifact_path=“datasets”)
- 使用RandomizedSearchCV进行超参数调优
- 把训练好的模型输出到文件里,mlflow.log_artifact(self.model_output_path)
- mlflow.log_params(best_lgbm_model.get_params())
- mlflow.log_metrics(metrics)
CI/CD
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
# Use the Jenkins image as the base image
FROM jenkins/jenkins:lts
# Switch to root user to install dependencies
USER root
# Install prerequisites and Docker
RUN apt-get update -y && \
apt-get install -y apt-transport-https ca-certificates curl gnupg software-properties-common && \
curl -fsSL https://download.docker.com/linux/debian/gpg | apt-key add - && \
echo "deb [arch=arm64] https://download.docker.com/linux/debian bullseye stable" > /etc/apt/sources.list.d/docker.list && \
apt-get update -y && \
apt-get install -y docker-ce docker-ce-cli containerd.io && \
apt-get clean
# Add Jenkins user to the Docker group (create if it doesn't exist)
RUN groupadd -f docker && \
usermod -aG docker jenkins
# Create the Docker directory and volume for DinD
RUN mkdir -p /var/lib/docker
VOLUME /var/lib/docker
# Switch back to the Jenkins user
USER jenkins
|