File size: 6,587 Bytes
d157f08 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 | 1
00:00:01,080 --> 00:00:08,280
And welcome back to 7.5 which is about pooling This is the next sequence of leads in our CNN so far
2
00:00:08,490 --> 00:00:15,240
we've dealt with convolutional nearly the convolution part and real you know let's look at pulling all
3
00:00:15,420 --> 00:00:22,660
those Monas subsampling spooling as I just said assaults and then a subsampling or downsampling is a
4
00:00:22,660 --> 00:00:27,170
simple process where we reduce the size or dimensionality of the future map.
5
00:00:27,280 --> 00:00:31,690
The purpose of this reductionists reduced number of parameters that we need to train whilst retaining
6
00:00:31,690 --> 00:00:36,670
most of the important features and information in the image.
7
00:00:36,870 --> 00:00:39,100
They are basically tree types of pooling we can apply.
8
00:00:39,100 --> 00:00:43,800
There are actually some Wolper does take a look at these Tree Man types that are used.
9
00:00:43,870 --> 00:00:46,250
So here's an example of Max pooling.
10
00:00:46,300 --> 00:00:52,900
Imagine this is the really outputs from all this input output here was reproduced from the real real
11
00:00:52,940 --> 00:00:53,510
layer.
12
00:00:53,800 --> 00:00:57,430
So you can imagine these values at the zeros here were actually negative values.
13
00:00:57,820 --> 00:01:03,790
So Max bhool basically uses a two by two Kial here we can define the screen size anything we want just
14
00:01:03,790 --> 00:01:09,520
like we did with the straight and of the kernels we used in the convolutional Liya and basically using
15
00:01:09,520 --> 00:01:10,530
a two by two.
16
00:01:10,600 --> 00:01:15,250
It splits up into two by two two by two two by two by two grid.
17
00:01:15,580 --> 00:01:24,190
So what it does Max beling takes it massively out of each tutelary for 167 2:41 and 235 and puts them
18
00:01:24,190 --> 00:01:25,380
into this block here.
19
00:01:25,750 --> 00:01:29,270
So this is what we call downsampling or subsampling.
20
00:01:29,320 --> 00:01:35,440
Basically we have sort of like compressed the image here and retain the most Max important features
21
00:01:36,680 --> 00:01:37,470
actually.
22
00:01:37,470 --> 00:01:40,160
Let's go back to the previous slide and previously.
23
00:01:40,210 --> 00:01:42,810
We mentioned average and sampling.
24
00:01:42,850 --> 00:01:48,850
Now as you can imagine average and sampling would just simply be the average of these values here here
25
00:01:49,120 --> 00:01:53,130
here here and sampling would just be the sum of these values.
26
00:01:53,460 --> 00:01:55,090
So it's also a way we can use pooling.
27
00:01:55,090 --> 00:02:01,900
However in majority of convolutional neural nets we always use maximally.
28
00:02:01,940 --> 00:02:04,740
So this is only so far just to do a recap.
29
00:02:04,880 --> 00:02:10,370
We have an input image with our key and all that is being slid across this image producing multiple
30
00:02:10,370 --> 00:02:11,380
different filters here.
31
00:02:11,450 --> 00:02:15,920
All of it seems much of the same size as the input image and that's because of zero padding.
32
00:02:16,250 --> 00:02:22,430
Then we have a real output which basically is the same size up of matrix as this except all the negative
33
00:02:22,430 --> 00:02:23,850
values into zeros.
34
00:02:24,230 --> 00:02:30,470
And then we have the subsampling are pulling away a lot downsampling which basically reduces this image.
35
00:02:30,530 --> 00:02:37,220
This Sorry this matrix by half 14 by 14 because as you can see using a two by two we have four by four
36
00:02:37,360 --> 00:02:41,570
and we get a two by two and that's still 12 filters.
37
00:02:41,750 --> 00:02:44,540
However they have not been downsampled.
38
00:02:44,540 --> 00:02:45,880
So let's move on now.
39
00:02:46,310 --> 00:02:52,100
So let's talk a bit more about pooling typically pooling is done using two by two windows with a straight
40
00:02:52,100 --> 00:02:54,540
of two and no padding applied.
41
00:02:54,560 --> 00:02:58,280
That's how we actually get this four by four here.
42
00:02:58,280 --> 00:03:01,920
It takes a two by two jump to make two jump and blah blah blah.
43
00:03:04,060 --> 00:03:08,170
So for smaller and put images or larger images we can use larger pools.
44
00:03:09,020 --> 00:03:14,530
Or smaller pools whichever you want to do and using the above settings pooling has the effect of reducing
45
00:03:14,530 --> 00:03:16,890
dimensionality width and height.
46
00:03:16,930 --> 00:03:18,330
Those are the only two dimensions we have.
47
00:03:18,340 --> 00:03:22,150
We reduce the of the previous layer by half.
48
00:03:22,330 --> 00:03:26,950
And to us removing tree quarter or 75 percent of the activations seen in the previously
49
00:03:31,290 --> 00:03:32,940
so keep moving on.
50
00:03:32,940 --> 00:03:39,470
This makes our model more invariant to small or minor transformations or distortions no input image.
51
00:03:39,570 --> 00:03:45,000
Since we're now averaging or taking to max or put from a small area of an image what this actually means
52
00:03:45,000 --> 00:03:51,020
is that we're instead of looking at specific pixels here in an image because we're actually dwindling
53
00:03:51,050 --> 00:03:57,480
sample and looking at a max in an area we sort of add some sort of variance or spatial variance too
54
00:03:57,480 --> 00:03:58,150
awful to say.
55
00:03:58,170 --> 00:04:04,800
So if filters on super specific to certain areas and I remember they being slid across the image.
56
00:04:04,800 --> 00:04:10,410
So imagine this filter have been Slackware's image looking for a specific edge or whatever it can actually
57
00:04:11,310 --> 00:04:13,160
add some invariants Now to it.
58
00:04:13,170 --> 00:04:20,490
So this actually increases do basically the ability of all convolutional model to generalize to information
59
00:04:20,490 --> 00:04:21,790
is never seen before.
60
00:04:23,540 --> 00:04:29,450
So now let's move on to what is kind of to finally the awesomely as in-between of you discussed them
61
00:04:29,450 --> 00:04:30,250
later on.
62
00:04:30,410 --> 00:04:35,990
But for now seeing is of course to CNN and this is the last layer to fully connected.
63
00:04:36,030 --> 00:04:36,730
FCPA.
|