Image Classification using AWS SageMaker

In this project, I used AWS Sagemaker to train a pretrained model that can perform image classification by using the Sagemaker profiling, debugger, hyperparameter tuning and other good ML engineering practices.

Project Schema

aws-ml-image-classification/
|----LICENSE.txt
|----README.md
|----hpo.py
|----inference.py
|----train_and_deploy.ipynb
|----train_model.py
|
|----test-photos-of-dogs/
|    |----dog1.png
|    |----dog2.jpg
|    |----dog3.jpg
|    |----dog4.jpg
|    |----dog5.jpeg
|    |----dog6.jpg
|    |----dog7.png
|
|----screenshots/
|    |----alljobs.png
|    |----besttrainingjob.png
|    |----endpoint_deployed.png
|    |----hyperparameter_jobs.png
|    |----infojob1.png
|    |----infojob2.png
|    |----infojob3.png
|    |----training_jobs.png
|    |----trial_screenshot.png
|
|----ProfilerReport/
|    |----profiler-output/
|        |----profiler-report.html
|        |----profiler-report.ipynb
|
|----results_inference/
     |----predicted_Alaskan_malamute_dog6.jpg
     |----predicted_American_eskimo_dog_dog2.jpg
     |----predicted_Bernese_mountain_dog_dog4.jpg
     |----predicted_Cane_corso_dog7.png
     |----predicted_German_shepherd_dog_dog5.jpeg
     |----predicted_Golden_retriever_dog1.png
     |----predicted_Golden_retriever_dog3.jpg

Project Set Up and Installation

Login in AWS and open SageMaker Studio.
Create an instance notebook using ml.t3.medium instance and Pythorch 2.0.0 Python CPU Optimized image.
Upload the necessary files of the project in order to run our project:
- hpo.py: script to train the model for hyperparamenter tuning
- train_model.py: script to train model using debugging and profiling hooks
- inference.py: script to deploy and inference
- train_deploy.ipynb: main notebook to download our data and make all the process for our project
- [OPTIONAL] test-photos-of-dogs folder contains random pictures of dogs in order to inference the deployed model
- [OPTIONAL] results_inference will contain the results of there predictions

Dataset

The provided dataset is the dogbreed classification dataset which can be downloaded clicking here. Our dataset comprises images that represent 133 distinct dog breeds. The breeds range from the widely recognized, such as the Labrador Retriever and the German Shepherd, to the more unique breeds like the Norwegian Buhund and the Plott.

Once the data is downloaded, it will be already split in three subfolders (train, test and valid) in order to create our model.

Hyperparameter Tuning

For our project we user a Base Model ResNet50, a powerful convolutional neural network sourced form the torchvisivion library.

Chosen Hyperparameters

Learning rate (lr): Determines step size during optimization. High values can overshoot; low values can slow training.
Bratch size: Number of samples per weight update. Bigger batches are more stable but may generalize less effectively.
Epochs: Times the model sees the entire dataset. More epochs can improve learning but risk overfitting.

Values for experimenting

Hyperparameter	Values
lr	0.001 to 0.1
batch size	32, 64, 128
epochs	1, 2

Our tunning job consisted in 3 training jobs using a ml.c5.2xlarge instance and choosing as a metric the Average Test Loss (AVG Test loss).

Best Hyperparameters after tunning

Hyperparameter	Tuned Value
lr	0.0010152583471663874
batch size	64
epochs	2

Screenshots

Hyperparameter tunning job

Completed training jobs (3)

Metric logs during training process (3)

1. First Training Job: pytorch-training-230917-0040-001-4120a907

2. Second Training Job: pytorch-training-230917-0040-002-3ee3433d

3. Third Training Job: pytorch-training-230917-0040-003-f603a2f7

Best Hyperparameters - Training Jobs Summary

Debugging and Profiling

Profiler Rules were set up to monitor the model's performance, particularly looking for instances when the loss isn't decreasing.

rules = [ 
    Rule.sagemaker(rule_configs.loss_not_decreasing()),
    ProfilerRule.sagemaker(rule_configs.ProfilerReport())]

Debugger Collections captured key metrics like weights, gradients, biases, and the CrossEntropyLoss output during training and evaluation.

debugging_collections = [
    CollectionConfig(
        name="model_weights",
        parameters={
            "train.save_interval": "5",
            "eval.save_interval": "1"
        }
    ),
    CollectionConfig(
        name="model_gradients",
        parameters={
            "train.save_interval": "5",
            "eval.save_interval": "1"
        }
    ),
    CollectionConfig(
        name="biases_values",
        parameters={
            "train.save_interval": "5",
            "eval.save_interval": "1"
        }
    ),
    CollectionConfig(
        name="LossOutput", 
        parameters={
            "include_regex": "CrossEntropyLoss_output_0",
            "train.save_interval": "1",
            "eval.save_interval": "1"
        }
    )
]

A Profiler Configuration collected system and framework metrics at regular intervals.

profiling_configuration = ProfilerConfig(system_monitor_interval_millis=500,
                                         framework_profile_params=FrameworkProfile())

The report in HTML format is included in ProfilerReport.

Results

Insights

Training loss decline significally from 5 to 1.9, indicating effective learning and fitting to the training data through the steps.
However, training loss is noisy, suggesting potential improvements with a larger batch size.
The validation loss starts low and remains relatively steady, hinting at a potential overfitting issue as it's consistently lower than the training loss.

Improvements

Introducing regularization techniques can help mitigate overfitting.
Augmenting the dataset or acquiring more data might provide a more generalized performance.

Model Deployment

The deployed model is attached from Debugging-Image-Classification-2023-09-17-11-28-25-671 and is based on the PyTorch framework, ResNet50:

estimator = PyTorch.attach('Debugging-Image-Classification-2023-09-17-11-28-25-671')

model_path = "s3://sagemaker-us-east-2-064258437334/Debugging-Image-Classification-2023-09-17-11-28-25-671/output/model.tar.gz"

pytorch_model = PyTorchModel(
    entry_point="inference.py",
    model_data=model_path, 
    role=role, 
    framework_version="1.8",
    py_version="py36"
)


predictor = pytorch_model.deploy(initial_instance_count=1, instance_type="ml.m5.large")

The model processes an image and returns arrays representing the likelihood of each dog breed.
These arrays can be interpreted by taking the highest value's index, which corresponds to a specific breed from a list of known breeds.
It was deployed on an instance type of ml.m5.large

How to Query the Model

1 - Load the Image:

Our folder test-photos-of-dogs contains several photos of dogs. In order to make predictions and see the results, we can upload any picture to this folder if we want to predict several photos.

2 - Preprocess the Image:

The image should be prepared for prediction by applying certain transformations. This involves resizing the image to have a consistent dimension, center cropping to obtain a square shape, converting the image from its PIL format to tensor format, and then normalizing its pixel values using predefined mean and standard deviation values

def preprocess_and_predict(img_path, predictor):
    recomp = T.Compose([
        T.Resize(256),
        T.CenterCrop(224),
        T.ToTensor(),
        T.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225]
        )])
    
    dog_pil = Image.open(img_path).convert('RGB') 
    image = recomp(dog_pil).unsqueeze(0)
    
    if image.size(1) != 3:
        raise ValueError(f"Unexpected number of channels: {image.size(1)}. Expected 3 channels.")
    
    response = predictor.predict(image.numpy())
    return response, dog_pil

And example of a prediction of one picture:

Function invocation

preprocess_and_predict("./test-photos-of-dogs/dog5.jpeg", predictor)

OUTPUT

[[ [[-7.90146399 -5.12475586 -6.25286102 -3.63396549 -5.11114645 -7.45391417
  -6.57456398 -5.94556093 -6.18455029 -4.69955254 -3.21500683 -4.01432657
  -2.26771617 -7.25555992 -4.97556448 -7.16805792 -7.40955591 -3.98202085
  -8.73099136 -2.11498475 -5.45636749 -2.21596789 -6.59912968 -8.85617828
  -5.91326189 -7.31432676 -4.55110884 -6.49787998 -6.51745558 -6.26583004
  -6.33802366 -6.69052315 -6.80027246 -6.47841787 -8.1390295  -5.53208208
  -9.42533016 -4.08472157 -5.75365925 -6.16152287 -4.66585827 -5.58548355
  -3.69104242 -3.68172479 -4.04420948 -6.97776127 -6.63336277 -5.4766264
  -7.80159807 -3.80039692 -6.64778137 -8.02877617 -8.22262859 -4.31276417
  -7.93589544 -5.1896553  -8.30073738 -8.06851673 -4.69244909 -4.91078377
  -6.25538588 -6.44481087 -7.28834724 -7.29276085 -6.37854815 -4.94758511
  -4.27422285 -5.81485653 -7.06046104 -6.53656006  2.52168727 -8.35471344
  -6.95983458 -6.27950096 -7.61815786 -4.6864543  -6.95156956 -5.55372953
  -8.81087971 -5.38375616 -5.68085432 -8.10847282 -6.4919734  -2.85764813
  -7.54979801 -7.41550636 -6.91618681 -7.1609664  -6.81252432 -8.19485474
  -8.53039646 -6.57389259 -8.59721088 -8.65538216 -7.78943777 -5.50999022
  -7.6509304  -2.56739855 -8.2444191  -6.48107529 -9.08440781 -6.80696392
  -5.06446266 -7.46158028 -5.99413681 -5.4945879  -5.59258795 -4.9009614
  -4.97772551 -5.27539158 -6.4203229  -4.73159361 -6.53904676 -4.56212473
  -8.80074024 -7.06140184 -7.47092199 -3.71955633 -7.04563475 -5.20756912
  -5.54778433 -7.97606468 -8.36039925 -7.88769293 -7.66006756 -6.01060581
  -6.13015509 -4.56613922 -4.70738506 -6.47524405 -9.38786793 -8.14974976
  -6.69619274]]
length of response: 133

In order to visualize our breeds and make the model more user friendly, the following code loops all pictures inside our folder and displays the photo and the predicted breed as title. The code also saves the picture in a folder called results_inference.

def preprocess_and_predict(img_path, predictor):
    recomp = T.Compose([
        T.Resize(256),
        T.CenterCrop(224),
        T.ToTensor(),
        T.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225]
        )])
    
    dog_pil = Image.open(img_path).convert('RGB') 
    image = recomp(dog_pil).unsqueeze(0)
    
    if image.size(1) != 3:
        raise ValueError(f"Unexpected number of channels: {image.size(1)}. Expected 3 channels.")
    
    response = predictor.predict(image.numpy())
    return response, dog_pil


# List with image paths for predictions
directory_path = './test-photos-of-dogs/'
image_paths = [os.path.join(directory_path, img) for img in os.listdir(directory_path) if img.endswith(('.png', '.jpg', '.jpeg'))]

# Loop
for img_path in image_paths:
    prediction, dog_image = preprocess_and_predict(img_path, predictor)
    
    predicted_index = np.argmax(prediction)
    breeds = sorted(list(unique_breeds))
    predicted_breed = breeds[predicted_index]
    name_breed = predicted_breed.split('.')[1]
    
    plt.imshow(dog_image)
    plt.title(f"Breed prediction: {name_breed}")
    plt.axis('off')
    
    img_name = os.path.basename(img_path) 
    save_path = os.path.join("results_inference", f"predicted_{name_breed}_{img_name}")
    plt.savefig(save_path, bbox_inches='tight', pad_inches=0.1)
    
    plt.show()

OUTPUT

This is my dog, she is not a goldie but she is a cutie

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Classification using AWS SageMaker

Project Schema

Project Set Up and Installation

Dataset

Hyperparameter Tuning

Screenshots

Debugging and Profiling

Results

Model Deployment

How to Query the Model

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
ProfilerReport/profiler-output		ProfilerReport/profiler-output
results_inference		results_inference
screenshots		screenshots
test-photos-of-dogs		test-photos-of-dogs
LICENSE.txt		LICENSE.txt
README.md		README.md
hpo.py		hpo.py
inference.py		inference.py
train_and_deploy.ipynb		train_and_deploy.ipynb
train_model.py		train_model.py

License

tbs89/aws-ml-image-classification

Folders and files

Latest commit

History

Repository files navigation

Image Classification using AWS SageMaker

Project Schema

Project Set Up and Installation

Dataset

Hyperparameter Tuning

Screenshots

Debugging and Profiling

Results

Model Deployment

How to Query the Model

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages