llama-stack-mirror/llama_stack/providers
yyymeta b79e0435de
fix: avoid tensor memory error (#1688)
# What does this PR do?

we randomly get errors like the following, it's most likely due to
accessing an object that is already deallocated

```

E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] Traceback (most recent call last):
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 90, in _wrap
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     fn(i, *args)
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 611, in _wrap
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     ret = record(fn)(*args_)
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     return f(*args, **kwargs)
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/internal-llama-stack/llama_stack/providers/inline/inference/meta_reference/parallel_utils.py", line 249, in worker_process_entrypoint
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     task = req_gen.send(result)
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/internal-llama-stack/llama_stack/providers/inline/inference/meta_reference/parallel_utils.py", line 156, in retrieve_requests
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     torch.distributed.broadcast_object_list(
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     return func(*args, **kwargs)
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3504, in broadcast_object_list
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     object_list[i] = _tensor_to_object(obj_view, obj_size, group)
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2961, in _tensor_to_object
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     return _unpickler(io.BytesIO(buf)).load()
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] EOFError: Ran out of input
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]
Process SpawnProcess-1:
Traceback (most recent call last):
```

## Test Plan
start server

```
llama-stack-client eval run-benchmark mmmu_v1  --model-id meta-llama/Llama-4-17B-Omni-Instruct  --output-dir /tmp/mmmu_standard --num-examples 30
```

[//]: # (## Documentation)
2025-03-18 16:17:29 -07:00
..
inline fix: avoid tensor memory error (#1688) 2025-03-18 16:17:29 -07:00
registry feat: Qdrant inline provider (#1273) 2025-03-18 14:04:21 -07:00
remote feat: Qdrant inline provider (#1273) 2025-03-18 14:04:21 -07:00
tests refactor(test): introduce --stack-config and simplify options (#1404) 2025-03-05 17:02:02 -08:00
utils fix: agents with non-llama model (#1550) 2025-03-17 22:11:06 -07:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00
datatypes.py chore: move all Llama Stack types from llama-models to llama-stack (#1098) 2025-02-14 09:10:59 -08:00