In deep learning (neural nets), convolutional neural networks (CNN)s, also known as ConvNets, are specialized artificial neural networks that can “see” the world as humans perceive it. These algorithms take an input image, then assign importance to various objects within that image to differentiate one object from the other. Convolutional neural networks are part of the field of computer vision, which is the study of how computers use artificial intelligence (AI) to achieve a higher understanding of digital imagery or video.
How does a computer implement natural language processing to perceive the world? CNN architecture mimics the organization of the visual cortex, the area of the brain that processes visual information, as well as the connectivity of neural pathways in the human brain. Convolutional networks consist of fully-connected layers of artificial neurons or mathematic functions that can take pixel values and identify visual features of an image by creating activation maps (maps that highlight the features of an image). An activation map of the final convolution layer in the CNN generates a set of values used to assign a class to an input image.
More specifically, the base layer, or convolutional layer of a convolutional neural network establishes learning parameters, followed by pooling layers, which apply a given function or convolution operation to reduce spatial size of an image. Outputs from a pooling operation build on each other by passing through an activation function to create feature maps, extracting increasingly complex features along the way. Max pooling calculates maximum values of each feature map that results in movement from the dissemination of corners and edges, to that of more complex subjects such as the human face.
There are many aspects of conventional neural networks that are advantageous to their application.