Posted Dufre.WC
A toy ConvNet: X’s and O’s
Says whether a picture is of an X or an O
For example
Trickier Cases
- translation
- scaling
- rotation
- weight
Deciding is hard
What computers see?
The red area is incorrect
ConvNet match pieces of the image
Features match pieces of the image
Filtering(The math behind the match)
- Line up the feature and the image patch
- Multiply each image pixel by the corresponding feature pixel
- Add them up
- Divide by the total number of pixels in the feature
(output is average value)
Pooling(Shrinking the image stack)
- Pick a window size (usually 2 or 3)
- Pick a stride (usually 2)
- Walk your window across your filtered images
- From each window, take the maximum value
Pooling layer
A stack of images becomes a stack of smaller images.
Keep the math from breaking by tweaking each of the values just a bit.
Change everything negative to zero
ReLu layer
A stack of images becomes a stack of images with no negative values.
Layers get stacked
The output of one becomes the input of the next.
Deep stacking
Fully connected layer
Every value gets a vote
Vote depends on how strongly a value predicts X or O.
Gradient descent
For each feature pixel and voting weight, adjust it up and down a bit and see how the error changes.
Putting it all together
A set of pixels becomes a set of votes
(human set parameters)
- Convolution
- Number of features
- Size of features
- Pooling
- Window size
- Window stride
- Fully Connected
- Number of neurons
Any 2D(or 3D) data
Things closer together are more closely related than things far away.
CNN only capture local “spatial” patterns in data.
If the data can’t be made to look like an image, CNN are less useful.
If your data is just as useful after swapping any of your columns with each other, then you can’t use Convolutional Neural Network.