TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor ...
When shutting down the Triton Inference Server with Python backend while using Triton metrics, a segmentation fault occurs in python_backend process. This happens because Metric::Clear attempts to ...