Argument ${WORKLOAD_CONFIG}

--workload-configuration: path to the prefix of execution trace files

The example traces can be found at:

${ASTRA_SIM}/examples/network_analytical/workload

Note

The naming rule for execution traces follows the format {path_prefix}.{npu_id}.et.

Using Chakra Execution Trace

ASTRA-sim supports Chakra ET (Execution Trace) as inputs to the workload layer.

To run a preset workload, such as AllReduce_1MB, execute the following commands:

cd ${ASTRA_SIM}/examples/network_analytical
bash run_network_analytical.sh

This script will compile ASTRA-Sim, run the workload on an 8-NPU cluster, and display the number of cycles required to complete the simulation.

[2025-01-26 23:03:37.384] [workload] [info] sys[0] finished, 64440 cycles, exposed communication 64440 cycles.
[2025-01-26 23:03:37.384] [workload] [info] sys[1] finished, 64440 cycles, exposed communication 64440 cycles.
[2025-01-26 23:03:37.384] [workload] [info] sys[2] finished, 64440 cycles, exposed communication 64440 cycles.
[2025-01-26 23:03:37.384] [workload] [info] sys[3] finished, 64440 cycles, exposed communication 64440 cycles.
[2025-01-26 23:03:37.384] [workload] [info] sys[4] finished, 64440 cycles, exposed communication 64440 cycles.
[2025-01-26 23:03:37.384] [workload] [info] sys[5] finished, 64440 cycles, exposed communication 64440 cycles.
[2025-01-26 23:03:37.384] [workload] [info] sys[6] finished, 64440 cycles, exposed communication 64440 cycles.
[2025-01-26 23:03:37.384] [workload] [info] sys[7] finished, 64440 cycles, exposed communication 64440 cycles.

An example of how to generate Chakra traces and run ASTRA-sim is illustrated at Running Simulation with Chakra.

Using Execution Trace Converter (et_converter)

You can convert ASTRA-sim 1.0 text input files into Chakra traces with the following commands.

$ cd ./extern/graph_frontend/chakra/
$ pip3 install .
$ chakra_converter Text \
    --input ../../../examples/text_converter/text_workloads/Resnet50_DataParallel.txt \
    --output ../../../examples/text_converter/text_workloads/Resnet50_DataParallel \
    --num-npus 8 \
    --num-passes 1

In the project root, run the following command.

$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware \
  --workload-configuration=./examples/text_converter/text_workloads/Resnet50_DataParallel \
  --system-configuration=./examples/text_converter/system.json \
  --network-configuration=./examples/text_converter/network.yml \
  --remote-memory-configuration=./examples/text_converter/remote_memory.json

Upon completion, ASTRA-sim will display the number of cycles it took to run the simulation.

[2025-01-26 22:50:05.669] [workload] [info] sys[0] finished, 1082478600 cycles, exposed communication 28600 cycles.
[2025-01-26 22:50:05.669] [workload] [info] sys[1] finished, 1082478600 cycles, exposed communication 28600 cycles.
[2025-01-26 22:50:05.669] [workload] [info] sys[2] finished, 1082478600 cycles, exposed communication 28600 cycles.
[2025-01-26 22:50:05.669] [workload] [info] sys[3] finished, 1082478600 cycles, exposed communication 28600 cycles.
[2025-01-26 22:50:05.669] [workload] [info] sys[4] finished, 1082478600 cycles, exposed communication 28600 cycles.
[2025-01-26 22:50:05.669] [workload] [info] sys[5] finished, 1082478600 cycles, exposed communication 28600 cycles.
[2025-01-26 22:50:05.669] [workload] [info] sys[6] finished, 1082478600 cycles, exposed communication 28600 cycles.
[2025-01-26 22:50:05.669] [workload] [info] sys[7] finished, 1082478600 cycles, exposed communication 28600 cycles.

Enable Communicator Groups

ASTRA-sim 2.0 supports communicator groups. You can pass a communicator group configuration file by specifying the file path using --comm-group-configuration. If you do not pass a communicator group configuration file, by default, it will create a single group with all GPUs. A valid communication group file is a JSON file with the following format.

{
  "<communicator_group_id>" : [gpu_ids]
}

For example, you can create two communicator groups with the following configuration file. The first communicator group, with ID 0, includes GPU IDs from 0 to 3. The second communicator group, with ID 1, includes GPU IDs from 4 to 7.

{
  "0": [0, 1, 2, 3],
  "1": [4, 5, 6, 7]
}