1
00:00:01,080 --> 00:00:08,280
And welcome back to 7.5 which is about pooling This is the next sequence of leads in our CNN so far

2
00:00:08,490 --> 00:00:15,240
we've dealt with convolutional nearly the convolution part and real you know let's look at pulling all

3
00:00:15,420 --> 00:00:22,660
those Monas subsampling spooling as I just said assaults and then a subsampling or downsampling is a

4
00:00:22,660 --> 00:00:27,170
simple process where we reduce the size or dimensionality of the future map.

5
00:00:27,280 --> 00:00:31,690
The purpose of this reductionists reduced number of parameters that we need to train whilst retaining

6
00:00:31,690 --> 00:00:36,670
most of the important features and information in the image.

7
00:00:36,870 --> 00:00:39,100
They are basically tree types of pooling we can apply.

8
00:00:39,100 --> 00:00:43,800
There are actually some Wolper does take a look at these Tree Man types that are used.

9
00:00:43,870 --> 00:00:46,250
So here's an example of Max pooling.

10
00:00:46,300 --> 00:00:52,900
Imagine this is the really outputs from all this input output here was reproduced from the real real

11
00:00:52,940 --> 00:00:53,510
layer.

12
00:00:53,800 --> 00:00:57,430
So you can imagine these values at the zeros here were actually negative values.

13
00:00:57,820 --> 00:01:03,790
So Max bhool basically uses a two by two Kial here we can define the screen size anything we want just

14
00:01:03,790 --> 00:01:09,520
like we did with the straight and of the kernels we used in the convolutional Liya and basically using

15
00:01:09,520 --> 00:01:10,530
a two by two.

16
00:01:10,600 --> 00:01:15,250
It splits up into two by two two by two two by two by two grid.

17
00:01:15,580 --> 00:01:24,190
So what it does Max beling takes it massively out of each tutelary for 167 2:41 and 235 and puts them

18
00:01:24,190 --> 00:01:25,380
into this block here.

19
00:01:25,750 --> 00:01:29,270
So this is what we call downsampling or subsampling.

20
00:01:29,320 --> 00:01:35,440
Basically we have sort of like compressed the image here and retain the most Max important features

21
00:01:36,680 --> 00:01:37,470
actually.

22
00:01:37,470 --> 00:01:40,160
Let's go back to the previous slide and previously.

23
00:01:40,210 --> 00:01:42,810
We mentioned average and sampling.

24
00:01:42,850 --> 00:01:48,850
Now as you can imagine average and sampling would just simply be the average of these values here here

25
00:01:49,120 --> 00:01:53,130
here here and sampling would just be the sum of these values.

26
00:01:53,460 --> 00:01:55,090
So it's also a way we can use pooling.

27
00:01:55,090 --> 00:02:01,900
However in majority of convolutional neural nets we always use maximally.

28
00:02:01,940 --> 00:02:04,740
So this is only so far just to do a recap.

29
00:02:04,880 --> 00:02:10,370
We have an input image with our key and all that is being slid across this image producing multiple

30
00:02:10,370 --> 00:02:11,380
different filters here.

31
00:02:11,450 --> 00:02:15,920
All of it seems much of the same size as the input image and that's because of zero padding.

32
00:02:16,250 --> 00:02:22,430
Then we have a real output which basically is the same size up of matrix as this except all the negative

33
00:02:22,430 --> 00:02:23,850
values into zeros.

34
00:02:24,230 --> 00:02:30,470
And then we have the subsampling are pulling away a lot downsampling which basically reduces this image.

35
00:02:30,530 --> 00:02:37,220
This Sorry this matrix by half 14 by 14 because as you can see using a two by two we have four by four

36
00:02:37,360 --> 00:02:41,570
and we get a two by two and that's still 12 filters.

37
00:02:41,750 --> 00:02:44,540
However they have not been downsampled.

38
00:02:44,540 --> 00:02:45,880
So let's move on now.

39
00:02:46,310 --> 00:02:52,100
So let's talk a bit more about pooling typically pooling is done using two by two windows with a straight

40
00:02:52,100 --> 00:02:54,540
of two and no padding applied.

41
00:02:54,560 --> 00:02:58,280
That's how we actually get this four by four here.

42
00:02:58,280 --> 00:03:01,920
It takes a two by two jump to make two jump and blah blah blah.

43
00:03:04,060 --> 00:03:08,170
So for smaller and put images or larger images we can use larger pools.

44
00:03:09,020 --> 00:03:14,530
Or smaller pools whichever you want to do and using the above settings pooling has the effect of reducing

45
00:03:14,530 --> 00:03:16,890
dimensionality width and height.

46
00:03:16,930 --> 00:03:18,330
Those are the only two dimensions we have.

47
00:03:18,340 --> 00:03:22,150
We reduce the of the previous layer by half.

48
00:03:22,330 --> 00:03:26,950
And to us removing tree quarter or 75 percent of the activations seen in the previously

49
00:03:31,290 --> 00:03:32,940
so keep moving on.

50
00:03:32,940 --> 00:03:39,470
This makes our model more invariant to small or minor transformations or distortions no input image.

51
00:03:39,570 --> 00:03:45,000
Since we're now averaging or taking to max or put from a small area of an image what this actually means

52
00:03:45,000 --> 00:03:51,020
is that we're instead of looking at specific pixels here in an image because we're actually dwindling

53
00:03:51,050 --> 00:03:57,480
sample and looking at a max in an area we sort of add some sort of variance or spatial variance too

54
00:03:57,480 --> 00:03:58,150
awful to say.

55
00:03:58,170 --> 00:04:04,800
So if filters on super specific to certain areas and I remember they being slid across the image.

56
00:04:04,800 --> 00:04:10,410
So imagine this filter have been Slackware's image looking for a specific edge or whatever it can actually

57
00:04:11,310 --> 00:04:13,160
add some invariants Now to it.

58
00:04:13,170 --> 00:04:20,490
So this actually increases do basically the ability of all convolutional model to generalize to information

59
00:04:20,490 --> 00:04:21,790
is never seen before.

60
00:04:23,540 --> 00:04:29,450
So now let's move on to what is kind of to finally the awesomely as in-between of you discussed them

61
00:04:29,450 --> 00:04:30,250
later on.

62
00:04:30,410 --> 00:04:35,990
But for now seeing is of course to CNN and this is the last layer to fully connected.

63
00:04:36,030 --> 00:04:36,730
FCPA.