Collective Algorithms: Native or Custom Implementation

Collective Algorithm Implementation

As discussed before, the simulator takes in a collective communication (e.g., AllReduce, AllGather, etc.) and breaks it down into send and receive messages. These send and receive messages are simulated by the network backend.

There are two ways the simulator breaks a collective into send and receive messages.

Native Collective: The simulator implements a predefined set of commonly used algorithms (e.g. Ring, DoubleBinary, HalvingDoubling, etc). This implementation logic resides within the simulator codebase and allows users to quickly explore a predefined set of algorithms.
Custom Collective: Users will provide their definition of the collective algorithm (i.e. the sequence of events to happen in the algorithm). ASTRA-sim’s System layer exposes a collective API through which users can define custom, arbitrary collective algorithms.

Both methods are implementations of the CollectivePhase::Algorithm object, which is the unit of scheduling in the System layer (refer to the previous scheduling page). Please refer to the code in CollectivePhase.hh for a definition of the Algorithm object, and [native_collectives] and [custom_collectives] for implementations. Because they implement the CollectivePhase::Algorithm object, both methods are executed through the stream based scheduler described in ::collective-scheduler.

Native Collective Implementation

This part is still under construction. For now, please refer to the code in the [native_collecties] directory, especially Ring.cc as a starting point.

Custom Collective Implementation

An inherent limitation of the above native method is that to simulate a new collective algorithm, one would have to implement the whole collective in ASTRA_sim native code. With an increasing number of work in non-regular collectives, such as TACOS (topology aware collectives)[1], MSCCLang (expressively written collectives based on DSL)[2], etc., the need to quickly simulate and iterate over a wide variety of arbitrary collective algorithms becomes ever more important.

Therefore, we expose a new collective API to accept the definition of any collective algorithms, not limited to the predefined set (Ring, etc.). For the representation, we use the Chakra ET schema for a separate graph. We represent the collective algorithm as a graph of COMM_SEND, COMM_RECV nodes in the Chakra ET schema. That is, instead of the system layer breaking down collectives into send and receive messages, the system layer simply follows the breakdown already represented in the Chakra graph. Since ASTRA-sim already uses Chakra ET to represent workload, using Chakra ET to additionally define collective algorithms provides an easy and simple way to navigate through the graph.

Refer to the figure on the top of this page. When the Workload layer issues an “AllReduce” collective, instead of running the native implementation logic already in the simulator codebase, the system layer will iterate through the Chakra ET representing the collective algorithm, which has been provided through the collective API. Note how the Chakra graph for the workload and the chakra graph for the collective is decoupled, and provided through different input points. It is the ASTRA-sim simulator that eventually ‘replaces’ the communication nodes with the collective implemenation. This graph substitution is made easy by the workload and communication using the same schema. We anticipate this to open up new avenues such as compute-communication co-optimization.

For a more detailed explanation of the background and motivation, please refer to our paper published in Hot Interconnects (HotI) 2024. (Paper Link)

For a more detailed explanation to generating the inputs to the collective API, please refer to the Collective API Input page.

Reference [1] Won, William, et al. “TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning.”, In Proceedings of IEEE MICRO 2024 (to appear), https://arxiv.org/abs/2304.05301. [2] Meghan Cowan, et al. “MSCCLang: Microsoft Collective Communication Language.”, In Proceedings of ASPLOS 2023, https://doi.org/10.1145/3575693.3575724