HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization
HydraHead is a novel attention hybridization architecture that combines Full Attention and Linear Attention at the head level, achieving superior long-context performance with redu…