Optimizes transformer attention with Flash Attention for 2–4x speedup and 10–20x memory reduction. Use when training or running transformers with long sequ…
Optimizes transformer attention with Flash Attention for 2–4x speedup and 10–20x memory reduction. Use when training or running transformers with long sequen...
This page belongs to the OpenClaw Skills learning hub with install guides, category navigation, and practical links.