Automatic tagging of clothing in E-Commerce, Using Tensorflow and GCP  

Classify the clothing products into various categories using Machine Learning.

By Devika Mishra

Shameless plugin: We are a data annotation platform to make it super easy for you to build ML datasets. Just upload data, invite your team and build datasets super quick. Check us out.

Why E-commerce and tagging of clothes?

From startups to small businesses right through to huge brands, there are a huge number of companies that can benefit from their own ecommerce website, where they can sell their own products or services. In today’s competitive and convenience focused society, no longer do consumers want to venture to the high street in order to buy items, instead consumers want to shop from their own homes, making ecommerce a flexible solution for both businesses and buyers.

With E-commerce gaining more popularity day after day the number of products available for shopping are also increasing. With this increasing trend it is extremely difficult to tag products like clothes which come in so many varieties to be tagged manually. So this was a small attempt made to use machine learning for easing out this task.

Image classification versus object detection

People often confuse image classification and object detection scenarios. In general, if you want to classify an image into a certain category, you use image classification. On the other hand, if you aim to identify the location of objects in an image, and, for example, count the number of instances of an object, you can use object detection.

ML Models

The models used were the inbuilt TensorFlow models for object detection customized for the classification of our data. The categories for the classification were: Shirts, T-shirts, Jackets, Jeans, Trousers, Sunglasses, Shoes, Tops, Skirts. There are multiple models available in TensorFlow details of which can be found at this link.

For my purpose, I have used a special class of convolutional neural networks called MobileNets. MobileNets do not provide as good of an accurate model as produced by a full-fledged deep neural network. However, the accuracy is surprisingly very high and good enough for many applications. The main difference between the MobileNet architecture and a “traditional” CNN’s is instead of a single 3x3 convolution layer followed by batch norm and ReLU, MobileNets split the convolution into a 3x3 depthwise conv and a 1x1 pointwise convolution.

There are multiple sources available online which have tutorials on builing custom classifiers using MobileNets or and other built-in model.


The images were downloaded from the leading E-commerce websites. This data was then cleaned(removal of duplicate images, removing unwanted images) and uploaded to the dataturks platform. Using the annotating tool available at the platform, these images were annotated with rectangular bounding box

The annotated data was downloaded (into a folder containing images and the JSON documents) and split into training and test data (80–20 split). Then training and test folders were given as an input to the python script for converting the data to Pascal VOC (demo here) . The xml documents and images obtained as a result from json to Pascal VOC script as then converted to a set of csv records and images. Finally, this data was then converted to tensor records for training and testing.

Dataset : (Make sure the dataset contains at least 100–150 images per class for training after splitting the data.)

The code for the above project can be found here in the GitHub repository.

If you have any queries or suggestions, I would love to hear about it. Please write to me at .

Training :

The training of models was done on a Ubuntu 10.04 LTS GPU box having NVIDIA TESLA K80 and 7GB RAM along with 30 GB hard disk. The training was done for 10,000 steps which took 3 hours approximately .

The entire process of training has taken a lot of effort and hard work from creating a dataset, getting it into the right format, setting up a Google Cloud Instance(gcp instance), training the model and finally testing it. Given below is a detailed overview of how the steps after creating the dataset were carried out.

Setting up the GCP instance: Google Cloud Platform is a cloud computing infrastructure which provides secure, powerful, high-performance and cost-effective frameworks. It’s not just for data analytics and machine learning, but that’s for another time. Check it out over here. Google is giving away $300 dollars of credit and 12 months as a free tier user.

The step by step guide to how to set up the instance is available on the Google cloud platform documentation and on multiple blogs (like here) .

Once the instance is finally created the set up will look like this:

Install TensorFlow GPU on Ubuntu 16.04 LTS:

    Step — 1

# ensure system is updated and has basic build tools
sudo apt-get update
sudo apt-get — assume-yes upgrade
sudo apt-get — assume-yes install tmux build-essential gcc g++ make binutils
sudo apt-get — assume-yes install software-properties-common

Step — 2

# Install your nvidia graphics driver.
Search for additional drivers in menu and open it. wait for minute and select nvidia driver and hit apply and restart.

Step — 3

# Google cuda toolkit archive.
Download cuda-8.0 .deb package and install it
sudo dpkg -i cuda-repo-ubuntu1604–8–0-local-ga2_8.0.61–1_amd64.deb (this is the deb file you’ve downloaded)
sudo apt-get update
sudo apt-get install cuda export PATH=/usr/local/cuda-8.0/bin ${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Step — 4

# Download cudnn v5.1 and run following command 
tar -xzvf cudnn-8.0-linux-x64-v5.1.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

Step — 5

# Prepare TensorFlow dependencies
sudo apt-get install libcupti-dev

Step — 6

#Download Python and then tensorflow gpu
bash Anaconda3–
pip install tensorflow-gpu==1.2

Step — 7

# Check gpu has been step properly or not
# Download this git repo and unzip it.
python models-master/tutorials/image/imagenet/
python models-master/tutorials/image/cifar10/


Modifying the files for your custom model:

As, I have used the MobileNet model, I have made changes to the configuration file of the same. The trained models can be downloaded here.

Changes in config file:

model {
ssd {
num_classes: 9
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0

Later section:

 train_config: {
  batch_size: 10
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0

  fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
  from_detection_checkpoint: true
  data_augmentation_options {
    random_horizontal_flip {
  data_augmentation_options {
    ssd_random_crop {

train_input_reader: {
  tf_record_input_reader {
    input_path: "data/train.record"
  label_map_path: "data/clothes-detection.pbtxt"

eval_config: {
  num_examples: 40

eval_input_reader: {
  tf_record_input_reader {
    input_path: "data/test.record"
  label_map_path: "training/clothes-detection.pbtxt"
  shuffle: false
  num_readers: 1


item { 
id: 1
name: ‘Jackets’
item {
id: 2
name: ‘Jeans’
item {
id: 3
name: ‘Shirts’
item {
id: 4
name: ‘Shoes’
item {
id: 5
name: ‘Skirts’
item {
id: 6
name: ‘sunglasses’
item {
id: 7
name: ‘Tops’
item {
id: 8
name: ‘Trousers’
item {
id: 9
name: ‘Tshirts’

Running the Training Job:

A local training job can be run with the following command:

# From the tensorflow/models/research/ directory
python object_detection/ \
--logtostderr \
--pipeline_config_path=${PATH_TO_YOUR_PIPELINE_CONFIG} \

where ${PATH_TO_YOUR_PIPELINE_CONFIG} points to the pipeline config and ${PATH_TO_TRAIN_DIR} points to the directory in which training checkpoints and events will be written to. By default, the training job will run indefinitely until the user kills it or it completes the number of steps mentioned.

Output when training runs: 
INFO:tensorflow:global step 11788: loss = 0.6717 (0.398 sec/step)
INFO:tensorflow:global step 11789: loss = 0.5310 (0.436 sec/step)
INFO:tensorflow:global step 11790: loss = 0.6614 (0.405 sec/step)
INFO:tensorflow:global step 11791: loss = 0.7758 (0.460 sec/step)
INFO:tensorflow:global step 11792: loss = 0.7164 (0.378 sec/step)
INFO:tensorflow:global step 11793: loss = 0.8096 (0.393 sec/step)

My total loss graphs looks like:

Running the Evaluation Job:

Evaluation is run as a separate job. The eval job will periodically poll the train directory for new checkpoints and evaluate them on a test dataset. The job can be run using the following command:

# From the tensorflow/models/research/ directory
python object_detection/ \
--logtostderr \
--pipeline_config_path=${PATH_TO_YOUR_PIPELINE_CONFIG} \
--checkpoint_dir=${PATH_TO_TRAIN_DIR} \

where ${PATH_TO_YOUR_PIPELINE_CONFIG} points to the pipeline config, ${PATH_TO_TRAIN_DIR} points to the directory in which training checkpoints were saved (same as the training job) and ${PATH_TO_EVAL_DIR} points to the directory in which evaluation events will be saved. As with the training job, the eval job run until terminated by default.

Running TensorBoard:

Progress for training and eval jobs can be inspected using Tensorboard. If using the recommended directory structure, Tensorboard can be run using the following command:

tensorboard --logdir=${PATH_TO_MODEL_DIRECTORY}

After this run the following command in another terminal in order to view the tensorboard on your browser: 
ssh -i public_ip -L 6006:localhost:6006 

Now open your browser and say localhost:6006 


where ${PATH_TO_MODEL_DIRECTORY} points to the directory that contains the train and eval directories. Please note it may take Tensorboard a couple minutes to populate with data.

When successfully loaded the TensorBoard looks like:

Given below are a few images of graphs like the learning rate, batch size, losses from the tensorboard for my model. All these graphs along with the others can be found on the tensorboard opened in your browser. Moving the cursor on the graph gives information like smoothed, step, value etc.


In order to test the model locally I exported the files to google drive so that it becomes easier to test using the object-detection’s object_detection_tutorial.ipynb file. Below given are the steps to do the same and easing out the task. Doing so will enable you to run the saved model multiple times for testing without being charged for the instance and access it locally at any point of time.

First zip the entire folder where your model checkpoints and graphs are saved.

1. SSH on to your linux box and download the Linux version of gdrive from GitHub.
cd ~

2. You should see a file in your home directory called something list uc=0B3X9GlR6EmbnWksyTEtCM0VfaFE. Rename this file to gdrive.
mv uc\?id\=0B3X9GlR6EmbnWksyTEtCM0VfaFE gdrive

3.  Assign this file executable rights.
chmod +x gdrive

4.  Install the file to your usr folder.
sudo install gdrive /usr/local/bin/gdrive

5. You’ll need to tell Google Drive to allow this program to connect to your account. To do this, run the gdrive program with any parameter and copy the text it gives you to your browser. Then paste in to your SSH window the response code that Google gives you.Run the following.
gdrive list

6. YOU ARE DONE! Now you can upload files as required.
gdrive upload trained.tar.gz

Now, the zipped file can be downloaded from the drive and tested locally.

A few changes to be made to the object_detection_tutorial.ipynb file are mentioned below:

the cell for variables under model preparation:

# What model to download.

MODEL_NAME = ‘ssd_mobilenet_v1_coco_2017_11_17’

# Path to frozen detection graph. This is the actual model that is used for the object detection.

PATH_TO_CKPT = MODEL_NAME + ‘/frozen_inference_graph.pb’ #name of your inference graph 
Any model exported using the tool can be loaded here simply by changing PATH_TO_CKPT to point to a new .pb file.

# List of the strings that is used to add correct label for each box.

PATH_TO_LABELS = os.path.join(‘data’, ‘mscoco_label_map.pbtxt’) #name of your pbtxt file. 
NUM_CLASSES = 90 #to the number of classes you want

In the first cell under detection:

(make sure the test images are labelled as image1.jpg or image1.jpeg, image2.jpg or image2.jpeg etc.)

PATH_TO_TEST_IMAGES_DIR = 'test_images' #path to your test images

TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 3) ] #i would range from 1 to number of test images+1 

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8) #desired size. 

This would complete the testing of your model. Following are a few results after I tested my model:


The results were quite impressive and gave the right classification for around 48 of 50 test images. The confidence level of the classification ranged from 88% to 98%. The results seemed convincing for MobileNets models which are said not to be having a great accuracy. Multiple other models like Faster RCNN or InceptionNets etc all available on GitHub can also be tried out. These customized models for image classification can be deployed as Android apps or can be used by the e-commerce websites. I would love to hear any suggestions or queries. Please write to me at

Shameless plugin: We are a data annotation platform to make it super easy for you to build ML datasets. Just upload data, invite your team and build datasets super quick. Check us out.