Argument ${WORKLOAD_CONFIG}

--workload-configuration: path to the prefix of execution trace files

The example traces can be found at:

${ASTRA_SIM}/inputs/workload/

Note

The naming rule for execution traces follows the format {path_prefix}.{npu_id}.et.

Using Chakra Execution Trace

ASTRA-sim supports Chakra ET (Execution Trace) as inputs to the workload layer.

An example of how to generate Chakra traces and run ASTRA-sim is illustrated at Running Simulation with Chakra.

Using Execution Trace Converter (et_converter)

You can convert ASTRA-sim 1.0 text input files into Chakra traces with the following commands.

$ cd ./extern/graph_frontend/chakra/
$ pip3 install .
$ python3 -m chakra.et_converter.et_converter \
    --input_type Text \
    --input_filename ../../../inputs/workload/ASTRA-sim-1.0/Resnet50_DataParallel.txt \
    --output_filename ../../../inputs/workload/ASTRA-sim-2.0/Resnet50_DataParallel \
    --num_npus 64 \
    --num_dims 1 \
    --num_passes 1

Run the following command.

$ cd -
$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware \
  --workload-configuration=./inputs/workload/ASTRA-sim-2.0/Resnet50_DataParallel \
  --system-configuration=./inputs/system/Switch.json \
  --network-configuration=./inputs/network/analytical/Switch.yml \
  --remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json

Upon completion, ASTRA-sim will display the number of cycles it took to run the simulation.

sys[62] finished, 6749042 cycles
sys[61] finished, 6749042 cycles
...
sys[0] finished, 6749042 cycles
sys[63] finished, 6749042 cycles

Enable Communicator Groups

ASTRA-sim 2.0 supports communicator groups. You can pass a communicator group configuration file by specifying the file path using --comm-group-configuration. If you do not pass a communicator group configuration file, by default, it will create a single group with all GPUs. A valid communication group file is a JSON file with the following format.

{
  "<communicator_group_id>" : [gpu_ids]
}

For example, you can create two communicator groups with the following configuration file. The first communicator group, with ID 0, includes GPU IDs from 0 to 3. The second communicator group, with ID 1, includes GPU IDs from 4 to 7.

{
  "0": [0, 1, 2, 3],
  "1": [4, 5, 6, 7]
}