Argument {COMM_GROUP_CONFIG}
ASTRA-sim 2.0 supports communicator groups.
You can pass a communicator group configuration file by specifying the file path using --comm-group-configuration
.
If you do not pass a communicator group configuration file, by default, it will create a single group with all GPUs.
A valid communication group file is a JSON file with the following format.
{
"<communicator_group_id>" : [gpu_ids]
}
For example, you can create two communicator groups with the following configuration file. The first communicator group, with ID 0, includes GPU IDs from 0 to 3. The second communicator group, with ID 1, includes GPU IDs from 4 to 7.
{
"0": [0, 1, 2, 3],
"1": [4, 5, 6, 7]
}
When simulating the workload, ASTRA-sim looks for the communication group id in each communication ET node (i.e. different operators of the same rank may have different communicator group). ASTRA-sim will look for the attribute pg_name
in the communication ET node.
The following is part of a Chakra ET.
{
"id": "4",
"name": "in_emb_y@0_X1COMM",
"type": "COMM_COLL_NODE",
"attr": [
{
"name": "comm_size",
"int64Val": "26843545600"
},
{
"name": "comm_type",
"int64Val": "2"
},
{
"name": "pg_name",
"stringVal": "17"
},
{
"name": "is_cpu_op",
"int32Val": 0
}
]
}