In this work, we propose a new hybrid architecture for voice activity detection incorporating both split-attention convolutional neural network and self-attention layers with rotary position embedding ...