FlashMLA: Optimizing the MLA Decoding Kernel for Hopper GPUs (DeepSeek Open Source Week Day 1)
General Introduction FlashMLA is an efficient MLA (Multi-head Latent Attention) decoding kernel developed by DeepSeek AI, optimized for NVIDIA Hopper architecture GPUs...