Releases: IBM/fmwork
Releases · IBM/fmwork
v1.0.4
infer/vllm/client
- Removed
--base-url http://localhost:8000
This may require changes to
downstream automation.
infer/vllm/process
- Added
--precision
with afp16
default value. - Added code to detect batch mode (
'static'
or'continuous'
) for Spyre
integration. This requiresVLLM_SPYRE_USE_CB
to be explicitly defined and
printed in theserver.log
file. Note that this should be done automatically
by therunner
-server
integration. - Changed
TTFT
metric from server's TTFT (via/metrics
) to client's - Changed
ITL
metric fromMean TPOT
toMedian ITL
, as reported by vLLM's
serving benchmark. - To better support experiments with datasets other than
random
(which
explicitly allows the definition of shapes); if such definition is not found
in the log files (e.g., ifsharegpt
dataset was used),process
will read
the appropriate lines fromclient.log
to get the average input / output
sizes.
v1.0.3
Finalized server-mode support for infer/vllm and added documentation.
v1.0.2
- Finalize support for
direct
andserver
modes forinfer/vllm
, includingprocess
script.
Documentation pending — to be added momentarily.
v1.0.1
General improvements to embed/tf
.
- Improved output formatting for arguments.
- Added processing script.
- Oh, and a README
☺️
v1.0.0
Still a partial release — but now with the latest scripts to run encoder models on CPUs / GPUs / Spyre. Subsequent releases will cover decoder models, as well as more options / different engines.
v0.1.0
Freezing working version that is currently being used for internal experimental sweeps.