Collective Scheduler
The system layer has a collective scheduler which schedules and dispatches collectives. Even if the dependencies for multiple non-dependent collectives have been resolved in the workload layer (such as in a Data Parallel case), a single NPU cannot issue dozens of collectives at once, due to hardware limits. Core to the scheduler is a set of queues called active_Streams.
Each queue holds StreamBaseline objects, which are depicted at the top right corner of the image. A StreamBaseline object represents a stream (i.e. collective), which consists of multiple collective phases. The variable phases_to_go is a queue holding these phases. The pointer my_current_phase points to the phase currently being executed.
For each stream, the function proceed_to_next_vnet_baseline is critical in advancing the collective phases and moving the stream object between one queue to another.
This function is called in the following possible cases:
When a stream has been removed from
ready_listand is about to be inserted intoactive_Streamsfor the first time.When a stream has finished one phase and is ready to wait for the next phase.
When a stream has finished its last phase.
Let’s first look at the behavior of proceed_to_next_vnet_baseline at case #2. In the image above, refer to the pink circles (2-1), (2-2), … (2-5).
Look at the queue currently holding the stream.
erase()theStreamBaselineobject from the queue, which is really alistobject. (Note that streams may not finish in the order they start executing.)Modify the StreamBaseline object. The finished collective phase is popped from
phases_to_go, andmy_current_phasenow points to the next phase to be executed.Insert the
StreamBaselineobject into the next queue usinginsert_stream.Call the function
notify_stream_removed. This looks at the head of the previous queue. The variablestream_pointerpoints to the frontmost stream which is not running (marked blue). The function starts the execution of this stream’s next phase by callingStreamBaseline::init().Similarly, use
notify_stream_addedto trigger the phase of the stream at the head of the new queue.
In the other cases, proceed_to_next_vnet_baseline executes a subset of the above steps. In case #1 (stream has just been removed from ready_list), proceed_to_next.. initializes the stream (1-2), inserts it into the first queue (1-3), and triggers the streams at the head of this queue. At case #3 (stream has finished), the function erases the stream from the previous queue (3-1), and triggers the streams at the head of the previous queue. Additionally, the StreamBaseline object is deleted, and notify_stream_finished is called to notify the Sys object that a stream has ended (3-6).