This article data science blogthon.
prologue
Researchers around the world are competing to create the most accurate and effective image recognition system. As such, we often end up bending our own neural network designs from the start. Even better, after the researchers trained these network architectures on a sizeable data set, they shared the trained neural network versions. Therefore, such a trained neural network can be used directly or as a training jumping point.
data set
Wordhierarchy (now only nouns) is used in ImageNet to organize images, with thousands of photos representing each node in the hierarchy. This project has made significant progress in the fields of deep learning and computer vision. Researchers can access the data free of charge for academic purposes.
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is an annual image recognition competition held by ImageNet. International teams of universities and companies compete to create the most accurate image recognition models. The pre-trained model included in Keras was developed using the smaller dataset utilized in this competition. The data set contains images of 1,000 different types of objects, such as food and animal breeds. For example, Granny Smith’s apple is one of the object types in the data set. This kind of apple alone has more than 1200 images in his data collection.
pre-trained model
Some of the pre-trained models for image classification:
VGGMore
Deep neural networks, called VGG, have 16 or 19 layers. In 2014 it represented the cutting edge. The design of its convolutional neural network is fairly general. Because it is easy to use and understand, it is still frequently used as the basis for other models. However, modern designs are usually more efficient.
ResNet-50
ResNet-50, a 50-layer neural network representing the state of the art in 2015, can use less memory and be more accurate than the VGG architecture. ResNet has a more complex design, in which the upper levels of the neural network are connected to many layers below in addition to the layers immediately below.
Inception v3
Another great performance design for 2015 is Inception v3. There are even more complex layouts built around layers, which are split into many separate paths before coming back together. These networks show increasing complexity and size of neural networks in 2014 and his 2015 work to improve accuracy. Modern neural network architectures are often more specialized.
What is a pre-trained model and what is its purpose?
- These are models, which are complex networks with many variables.
- Training such networks usually requires a lot of time and resources.
- Even if it’s slightly different, you might drop the top layer and train only the weights of that layer (transfer learning).
Resolution
Now let’s implement a pre-trained model to recognize objects and images. All pre-trained models are included with Keras, so we will use the Keras library. Only VGG models are covered.
Let’s get started.
First, let’s import all the required packaG16 from our Keras application.
https://gist.github.com/callmemaze/3baef5f752bc78c0a83139888b3ae1e5
Next, let’s load an image file to process. I used a picture of a dog, but feel free to use any picture you like. It’s too large to use a neural network to process the loading image. The image size should correspond to the number of input nodes when feeding photos to the neural network. Images you put into the network for VGG must be 224 x 224 pixels in size. So set the target size parameter to that value. In addition, use the image.img to array method to convert the image data into an array of numbers that can be fed into the neural network.
https://gist.github.com/callmemaze/005a07ead44112ac6f76962ed0ba6dbe
Now give the image a fourth dimension. This is so that Keras can receive an array of many photos at once. As a result, a single image becomes an array of multiple images, each with a single element. Images should always be normalized before input to the neural network so that each pixel has a value between 0 and 1. We preprocess the input and the normalization function built into the VGG model does just that. All we have to do is call and send the information. Let’s say vgg16.preprocessinput to pass the data that is x. Now let’s visualize the image using the matplotlib library.
https://gist.github.com/callmemaze/891cfeb4faea3fca98555c4cf3d1f811
Now create a new vgg16 object to create a new instance of the model.
https://gist.github.com/callmemaze/88684dfbe6351dbb8de576be4423bdb7
Now we are ready to do image prediction using the normalized data and the neural network. You can achieve this by providing data using model.predict. The predictions you receive will be a 1,000-element float array. Each element in the array represents the probability of each of the 1,000 things the model was trained to recognize in the image. The names of the most probable matches are provided by the VGG model’s decoded image prediction function, which simplifies the process. Now you can call the vgg16.decode prediction. After that, just send the already created image prediction object. The top 5 most likely matches are automatically provided.
https://gist.github.com/callmemaze/884dd9dc0e7d1fc7a8c8226b4e18da6f
Image prediction fits our images pretty well, in my opinion. Standard poodles, kvass and labrador retrievers are a few other matches. Please try again with your own photo. It’s interesting to observe the predictions it makes and the kinds of images that perplex it.
Conclusion
We have successfully implemented a pre-trained model for VGG and used it to predict images. I just gave an overview of VGG’s pre-trained image classification algorithms and how to use them. But this is an ever-expanding field, so there are always new models to predict and new frontiers to explore. Please test the above model on different datasets using different parameter settings and report your results in the comments below.
Endnotes:
- Pre-trained models work like magic. Just download and use immediately without training or data.
- If the source job and target task are different and the domain is somewhat similar, you may need to train several layers, but it will take less time and require considerably less data than starting from scratch. .
- The only good use cases for importing an existing model and running image prediction immediately are early stages of prototyping or model trials. However, fine-tuning your network is still the recommended practice.
- It is usually not necessary to train a neural network from scratch. Instead, you can take an existing neural network and modify it to solve new problems with transfer learning.
where is the code?
Full code can be found on Github hereI used Google Colaboratory, but feel free to use whatever you are comfortable with. Stars are very helpful while you are there.
please contact me
don’t be shy.let’s communicate
thank you for reading. Happy learning and happy coding! See you soon.
Media shown in this article are not owned by Analytics Vidhya and are used at the author’s discretion.