1 00:00:01,080 --> 00:00:08,280 And welcome back to 7.5 which is about pooling This is the next sequence of leads in our CNN so far 2 00:00:08,490 --> 00:00:15,240 we've dealt with convolutional nearly the convolution part and real you know let's look at pulling all 3 00:00:15,420 --> 00:00:22,660 those Monas subsampling spooling as I just said assaults and then a subsampling or downsampling is a 4 00:00:22,660 --> 00:00:27,170 simple process where we reduce the size or dimensionality of the future map. 5 00:00:27,280 --> 00:00:31,690 The purpose of this reductionists reduced number of parameters that we need to train whilst retaining 6 00:00:31,690 --> 00:00:36,670 most of the important features and information in the image. 7 00:00:36,870 --> 00:00:39,100 They are basically tree types of pooling we can apply. 8 00:00:39,100 --> 00:00:43,800 There are actually some Wolper does take a look at these Tree Man types that are used. 9 00:00:43,870 --> 00:00:46,250 So here's an example of Max pooling. 10 00:00:46,300 --> 00:00:52,900 Imagine this is the really outputs from all this input output here was reproduced from the real real 11 00:00:52,940 --> 00:00:53,510 layer. 12 00:00:53,800 --> 00:00:57,430 So you can imagine these values at the zeros here were actually negative values. 13 00:00:57,820 --> 00:01:03,790 So Max bhool basically uses a two by two Kial here we can define the screen size anything we want just 14 00:01:03,790 --> 00:01:09,520 like we did with the straight and of the kernels we used in the convolutional Liya and basically using 15 00:01:09,520 --> 00:01:10,530 a two by two. 16 00:01:10,600 --> 00:01:15,250 It splits up into two by two two by two two by two by two grid. 17 00:01:15,580 --> 00:01:24,190 So what it does Max beling takes it massively out of each tutelary for 167 2:41 and 235 and puts them 18 00:01:24,190 --> 00:01:25,380 into this block here. 19 00:01:25,750 --> 00:01:29,270 So this is what we call downsampling or subsampling. 20 00:01:29,320 --> 00:01:35,440 Basically we have sort of like compressed the image here and retain the most Max important features 21 00:01:36,680 --> 00:01:37,470 actually. 22 00:01:37,470 --> 00:01:40,160 Let's go back to the previous slide and previously. 23 00:01:40,210 --> 00:01:42,810 We mentioned average and sampling. 24 00:01:42,850 --> 00:01:48,850 Now as you can imagine average and sampling would just simply be the average of these values here here 25 00:01:49,120 --> 00:01:53,130 here here and sampling would just be the sum of these values. 26 00:01:53,460 --> 00:01:55,090 So it's also a way we can use pooling. 27 00:01:55,090 --> 00:02:01,900 However in majority of convolutional neural nets we always use maximally. 28 00:02:01,940 --> 00:02:04,740 So this is only so far just to do a recap. 29 00:02:04,880 --> 00:02:10,370 We have an input image with our key and all that is being slid across this image producing multiple 30 00:02:10,370 --> 00:02:11,380 different filters here. 31 00:02:11,450 --> 00:02:15,920 All of it seems much of the same size as the input image and that's because of zero padding. 32 00:02:16,250 --> 00:02:22,430 Then we have a real output which basically is the same size up of matrix as this except all the negative 33 00:02:22,430 --> 00:02:23,850 values into zeros. 34 00:02:24,230 --> 00:02:30,470 And then we have the subsampling are pulling away a lot downsampling which basically reduces this image. 35 00:02:30,530 --> 00:02:37,220 This Sorry this matrix by half 14 by 14 because as you can see using a two by two we have four by four 36 00:02:37,360 --> 00:02:41,570 and we get a two by two and that's still 12 filters. 37 00:02:41,750 --> 00:02:44,540 However they have not been downsampled. 38 00:02:44,540 --> 00:02:45,880 So let's move on now. 39 00:02:46,310 --> 00:02:52,100 So let's talk a bit more about pooling typically pooling is done using two by two windows with a straight 40 00:02:52,100 --> 00:02:54,540 of two and no padding applied. 41 00:02:54,560 --> 00:02:58,280 That's how we actually get this four by four here. 42 00:02:58,280 --> 00:03:01,920 It takes a two by two jump to make two jump and blah blah blah. 43 00:03:04,060 --> 00:03:08,170 So for smaller and put images or larger images we can use larger pools. 44 00:03:09,020 --> 00:03:14,530 Or smaller pools whichever you want to do and using the above settings pooling has the effect of reducing 45 00:03:14,530 --> 00:03:16,890 dimensionality width and height. 46 00:03:16,930 --> 00:03:18,330 Those are the only two dimensions we have. 47 00:03:18,340 --> 00:03:22,150 We reduce the of the previous layer by half. 48 00:03:22,330 --> 00:03:26,950 And to us removing tree quarter or 75 percent of the activations seen in the previously 49 00:03:31,290 --> 00:03:32,940 so keep moving on. 50 00:03:32,940 --> 00:03:39,470 This makes our model more invariant to small or minor transformations or distortions no input image. 51 00:03:39,570 --> 00:03:45,000 Since we're now averaging or taking to max or put from a small area of an image what this actually means 52 00:03:45,000 --> 00:03:51,020 is that we're instead of looking at specific pixels here in an image because we're actually dwindling 53 00:03:51,050 --> 00:03:57,480 sample and looking at a max in an area we sort of add some sort of variance or spatial variance too 54 00:03:57,480 --> 00:03:58,150 awful to say. 55 00:03:58,170 --> 00:04:04,800 So if filters on super specific to certain areas and I remember they being slid across the image. 56 00:04:04,800 --> 00:04:10,410 So imagine this filter have been Slackware's image looking for a specific edge or whatever it can actually 57 00:04:11,310 --> 00:04:13,160 add some invariants Now to it. 58 00:04:13,170 --> 00:04:20,490 So this actually increases do basically the ability of all convolutional model to generalize to information 59 00:04:20,490 --> 00:04:21,790 is never seen before. 60 00:04:23,540 --> 00:04:29,450 So now let's move on to what is kind of to finally the awesomely as in-between of you discussed them 61 00:04:29,450 --> 00:04:30,250 later on. 62 00:04:30,410 --> 00:04:35,990 But for now seeing is of course to CNN and this is the last layer to fully connected. 63 00:04:36,030 --> 00:04:36,730 FCPA.