Inference Service | Number of workers | Median Latency (ms) | 95th Quantile Latency (ms) | Throughput (rps) |
---|---|---|---|---|
PyTorch and FastAPI optimized | 1 | 290 | 350 | 6.8 |
PyTorch and FastAPI optimized | 2 | 250 | 300 | 7.8 |
PyTorch and FastAPI optimized | 3 | 250 | 310 | 7.8 |
PyTorch and FastAPI optimized | 4 | 260 | 300 | 7.8 |
PyTorch and FastAPI optimized | 5 | 250 | 300 | 7.8 |
PyTorch and FastAPI optimized | 6 | 270 | 320 | 7.3 |
PyTorch and FastAPI optimized | 7 | 280 | 340 | 7 |
PyTorch and FastAPI optimized | 8 | 280 | 350 | 7.1 |
by Sheetsu.com