热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

APracticalGuideonConvolutionalNeuralNetworks(CNNs)withKeras

Theoreticalexplanationandareal-lifeexampleConvolutionalneuralnetworks(CNNs)arecommonlyusedindatasciencedomainespeciallyforcomputervisionandimageclassificationtasks.Consideranimageclassificationtask.Im

A Practical Guide on Convolutional Neural Networks (CNNs) with Keras

Theoretical explanation and a real-life example

Photo by Markus Spiske on Unsplash

Convolutional neural networks (CNNs) are commonly used in data science domain especially for computer vision and image classification tasks. Consider an image classification task. Images consist of pixels which are represented with numbers. In the convolution layer of CNNs, filters (or feature detectors) are applied to the image to extract distinctive features of the image by preserving the spatial relationships among pixels.

Convolution operation is done as follows:

We have a 10×10 image and a 3×3 filter. Filter, starting from the yellow-marked position, scans through the image and creates a feature map. At each step, the dot product of image pixels and filter is calculated and the resulting scalar is put in the corresponding position of the feature map.

Strideparameter controls the movement of the filter. When the stride is 1, the filter moves 1 pixel at each time.

The goal of the filter is to preserve the spatial relationships of a pixel so it needs to see the pixels on the left, right, up, and bottom. Therefore, with a 3×3 filter, we start with the pixel on the second row and second column. The pixels on the first and last row as well as the ones on the first and last column cannot be a part of the feature map because they do not have up, bottom, right or left neighbors.

This is the reason why the resulting feature map is 8×8. If we apply a 5×5 filter, the feature map becomes 6×6.

If we want to preserve the pixels on the borders, we can use padding and add zeros around the image.

10×10 image padded with zeros. Resulting image is 12×12.

In a convolution layer, not just one filter is used. Many different filters are applied to the image. Each filter is aimed to capture a different feature such as edges, horizontol lines and so on.

Images are highly non-linear so we need to increase the non-linearity in convolution layer. In the convolution layer, rectifier function is applied to increase the non-linearity in the image. Rectifier function acts as an additional filter to break up linearity.

Convolution layer

Then we have Pooling layer which reduces the size of the feature maps while maintaining the preserved features of the image.

Max Pooling with 2×2 box

In the pooling layer, a box with a specified size is captured and the maximum value in that box is taken. This is maximum pooling. We can also take the sum or mean of values in the box. This box scans through the entire feature map. The picture above represents the max pooling layer with a 2×2 box so the size of the feature map reduced to 4×4.

The advantages of pooling:

  • Reducing the size while preserving the features
  • Eliminating parts that are not significant
  • Introducing spatial variance
  • Reducing the number of features and thus reducing the risk of overfitting
In a convolutional neural network, there are multiple convolution and pooling layers depending on the complexity of the task.

Now, we need to flatten pooled feature maps in order to feed them to a fully connected layer. After the flattening step, the structure of the remaining part of a convolutional neural network is just like a feed-forward neural network. Flattening step is very simple.

The resulting array after flattening is used as the input to a Dense layer. The pooled feature maps are flattened and fed through a dense layer. Typical structure of a convolutional neural network is:

Figure source

Please note that subsampling is used for pooling. This convolutional neural network has two convolution layers and two pooling layers.

Let’s go through a real example.

Building a CNN to classify images

We will use images of motorcycles and airplanes from Caltech101 dataset. It is a great dataset to train and test a CNN. Many thanks to the community who prepared and let us use this dataset.

We will use Keras which is a high-level deep learning library built on TensorFlow. Let’s start with basic imports:

import numpy as np
import tensorflow as tftf.__version__
'2.2.0-rc3'

We will be using google colab environment for this task. I saved the images to my google drive. In order to directly access the files in the drive from colab, we just need to import drive:

from google.colab import drive
drive.mount('/content/gdrive')

This will prompt us to approve by copy-pasting a link. Then we can easily access the files in the google drive.

Let’s check a couple of images using matplotlib:

import matplotlib.pyplot as plt
import matplotlib.image as mpimg%matplotlib inline

Images are just arrays of numbers in 2D (black and white) and 3D (colored). We should also check the structure of an image in addition to seeing how it looks.

img = mpimg.imread('/content/gdrive/My Drive/airplane_motorbike/train/airplanes/image_0001.jpg')type(img)
numpy.ndarrayimg.shape
(164, 398, 3)

The first two dimensions show the size of the grid of pixels and the third dimension is indicating if it is colored or grayscale. So this image is a colored 164×398 pixels image. If the last dimension is 1, then the image is grayscale. Let’s see how it looks:

imgplot = plt.imshow(img)

Let’s also see an image of a motorbike:

img2 = mpimg.imread('/content/gdrive/My Drive/airplane_motorbike/train/motorbikes/image_0001.jpg')print(img2.shape)
(161, 262, 3)imgplot = plt.imshow(img2)

You may have noticed the number of pixels are different. The shape of the motorbike image is (161, 262, 3) but the shape of airplane image is (164, 398, 3). The images have to be in the same shape to be used in a CNN. We can manually adjust the sizes but it is a tedious task.

We can use ImageDataGenerator which is an image preprocessing class of Keras. We just need to organize images in a folder structe and ImageDataGenerator will handle the rest:

Folder structure

ImageDataGenerator generates batches of tensor image data with real-time data augmentation. ImageDataGenerator creates many batches of the images by applying random selections and transformations (such as rotating and shifting) in batches. So it increases the diversity of the dataset. Data augmentation increases the diversity of data which is very useful especially when the number of images is limited. Increasing the diversity of the dataset helps to get more accurate results and also prevents the model from overfitting.

The images are in the “airplanes” and “motorbikes” folders. I put 640 images of each category in train folder and 160 images in validation folder. Let’s implement ImageDataGenerator objects for both training and validation:

from tensorflow.keras.preprocessing.image import ImageDataGeneratorimport os

We first create ImageDataGenerator objects for train and validation sets:

We pass in rescale parameter to normalize the values in the pixels in [0,1] range. Normalization of data is an essential practice in neural networks.

Then we indicate the path to the files that contain images:

Then we create train and validation generators:

We used flow_from_directory method. Another option is to use flow method. The details of each method are explained in detail in keras documentation .

Let’s go over the parameters. The first one is the file path which we already created in the previous step. Target_size indicates the size of resulting images. Train and validation generators will resize all images to 150×150 pixels. Batch_size is the number of images in a batch. The class_mode is binary since we have two classes. As we can see in the output, there are 1280 images in train folde and 320 in validation folder belonging to 2 classes.

It is time to build our model:

We have 3 convolution layers and a pooling layer right after each convolution layer. In the first convolution layer, we define the number of filters and the filter size. I choose to use 16 filters with a size of 3×3. Then we define the activation function which is relu. For the first convolution layers ,we also need to define the input_shape because the model will not know the size of our images. For pooling layer, we only need to specify the size of box used for pooling.

We follow the same procedure for the other convolutional layers. The only difference is that we do not have to specify the input shape because the model will know the input from the output of previous layer.

Input_shape is required only for the first convolution layer.

Then we have flatten layer to flatten pooled feature maps. Flattened feature maps are fed to a Dense layer. Then we have an output layer.

It is important to note that there is not a strict rule to determine number of filters, filter size, number of neurons in a layer and so on. The values for these parameters are adjusted with experience or trial and error. You can adjust and try different values.

We have just built a CNN model. To see an overview of the model using summary method:

model.summary()

The size of input images are 150×150. We used 3×3 filters so the size of feature maps are 148×148. The size is reduced to half in the first pooling layer. At each convolution-pooling layer pair, the size is reduced but features in the images are preserved. In this model, we have more than 2 million parameters to train which is a lot. This is a simple image classification task. Imagine the number of parameters to train the model used in very complex tasks.

Now it is time to compile the model:

We need to specify the loss function , optimizer and a metric for evaluating the performance. For loss function, binary_crossentropy can be used since this is a binary classification task. There are many optimizers available in Keras , RMSprop and Adam optimizers are commonly used. The metric we are using is accuracy.

We finished preprocessing of images, built the model and compiled it. Now we can fit data to the model and train it:

We use the ImageDataGenerator objects that we created earlier as training and validation sets. Steps_per_epoch parameter is number of images divided by the batch size. Validation_steps is calculated similarly using number of images in validation set and batch size. An epoch is done when the model goes through the entire traning set. I use 5 epochs but you can change it to see how it affects the accuracy. Let’s see the results:

The model has 99.69% accuracy on training set and 99.37% accuracy on test set which is a great result. Please keep in mind that this is a simple image classification task. We will deal with much more complicated tasks. However, the way to build the network and logic behind is the same. Thus, if we learn the basics well, we can easily adopt to more complicated tasks.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 我们


推荐阅读
  • 本文讨论了clone的fork与pthread_create创建线程的不同之处。进程是一个指令执行流及其执行环境,其执行环境是一个系统资源的集合。在调用系统调用fork创建一个进程时,子进程只是完全复制父进程的资源,这样得到的子进程独立于父进程,具有良好的并发性。但是二者之间的通讯需要通过专门的通讯机制,另外通过fork创建子进程系统开销很大。因此,在某些情况下,使用clone或pthread_create创建线程可能更加高效。 ... [详细]
  • 本文介绍了设计师伊振华受邀参与沈阳市智慧城市运行管理中心项目的整体设计,并以数字赋能和创新驱动高质量发展的理念,建设了集成、智慧、高效的一体化城市综合管理平台,促进了城市的数字化转型。该中心被称为当代城市的智能心脏,为沈阳市的智慧城市建设做出了重要贡献。 ... [详细]
  • 本文介绍了九度OnlineJudge中的1002题目“Grading”的解决方法。该题目要求设计一个公平的评分过程,将每个考题分配给3个独立的专家,如果他们的评分不一致,则需要请一位裁判做出最终决定。文章详细描述了评分规则,并给出了解决该问题的程序。 ... [详细]
  • 本文讨论了使用差分约束系统求解House Man跳跃问题的思路与方法。给定一组不同高度,要求从最低点跳跃到最高点,每次跳跃的距离不超过D,并且不能改变给定的顺序。通过建立差分约束系统,将问题转化为图的建立和查询距离的问题。文章详细介绍了建立约束条件的方法,并使用SPFA算法判环并输出结果。同时还讨论了建边方向和跳跃顺序的关系。 ... [详细]
  • Android Studio Bumblebee | 2021.1.1(大黄蜂版本使用介绍)
    本文介绍了Android Studio Bumblebee | 2021.1.1(大黄蜂版本)的使用方法和相关知识,包括Gradle的介绍、设备管理器的配置、无线调试、新版本问题等内容。同时还提供了更新版本的下载地址和启动页面截图。 ... [详细]
  • 本文介绍了解决二叉树层序创建问题的方法。通过使用队列结构体和二叉树结构体,实现了入队和出队操作,并提供了判断队列是否为空的函数。详细介绍了解决该问题的步骤和流程。 ... [详细]
  • CF:3D City Model(小思维)问题解析和代码实现
    本文通过解析CF:3D City Model问题,介绍了问题的背景和要求,并给出了相应的代码实现。该问题涉及到在一个矩形的网格上建造城市的情景,每个网格单元可以作为建筑的基础,建筑由多个立方体叠加而成。文章详细讲解了问题的解决思路,并给出了相应的代码实现供读者参考。 ... [详细]
  • 如何搭建Java开发环境并开发WinCE项目
    本文介绍了如何搭建Java开发环境并开发WinCE项目,包括搭建开发环境的步骤和获取SDK的几种方式。同时还解答了一些关于WinCE开发的常见问题。通过阅读本文,您将了解如何使用Java进行嵌入式开发,并能够顺利开发WinCE应用程序。 ... [详细]
  • 李逍遥寻找仙药的迷阵之旅
    本文讲述了少年李逍遥为了救治婶婶的病情,前往仙灵岛寻找仙药的故事。他需要穿越一个由M×N个方格组成的迷阵,有些方格内有怪物,有些方格是安全的。李逍遥需要避开有怪物的方格,并经过最少的方格,找到仙药。在寻找的过程中,他还会遇到神秘人物。本文提供了一个迷阵样例及李逍遥找到仙药的路线。 ... [详细]
  • 先看官方文档TheJavaTutorialshavebeenwrittenforJDK8.Examplesandpracticesdescribedinthispagedontta ... [详细]
  • 本文介绍了Java工具类库Hutool,该工具包封装了对文件、流、加密解密、转码、正则、线程、XML等JDK方法的封装,并提供了各种Util工具类。同时,还介绍了Hutool的组件,包括动态代理、布隆过滤、缓存、定时任务等功能。该工具包可以简化Java代码,提高开发效率。 ... [详细]
  • 本文介绍了一种划分和计数油田地块的方法。根据给定的条件,通过遍历和DFS算法,将符合条件的地块标记为不符合条件的地块,并进行计数。同时,还介绍了如何判断点是否在给定范围内的方法。 ... [详细]
  • 本文介绍了Oracle数据库中tnsnames.ora文件的作用和配置方法。tnsnames.ora文件在数据库启动过程中会被读取,用于解析LOCAL_LISTENER,并且与侦听无关。文章还提供了配置LOCAL_LISTENER和1522端口的示例,并展示了listener.ora文件的内容。 ... [详细]
  • Go GUIlxn/walk 学习3.菜单栏和工具栏的具体实现
    本文介绍了使用Go语言的GUI库lxn/walk实现菜单栏和工具栏的具体方法,包括消息窗口的产生、文件放置动作响应和提示框的应用。部分代码来自上一篇博客和lxn/walk官方示例。文章提供了学习GUI开发的实际案例和代码示例。 ... [详细]
  • JDK源码学习之HashTable(附带面试题)的学习笔记
    本文介绍了JDK源码学习之HashTable(附带面试题)的学习笔记,包括HashTable的定义、数据类型、与HashMap的关系和区别。文章提供了干货,并附带了其他相关主题的学习笔记。 ... [详细]
author-avatar
手浪用户2602915623
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有