why are you using query/key layer norm AFTER rotary

#11

by vince62s - opened Sep 4, 2025

Sep 4, 2025

My understanding is that standard practice is to LN before .... if we want to use flash decoding with the specific flash kernel then it is an issue because rotary is embedded in the kernel.

QZHQN

Sep 9, 2025

mango

QZHQN

Sep 9, 2025

芒果

percisestretch

Sep 9, 2025

mango.

percisestretch

Sep 9, 2025

mango.mango.mango.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment