Skip to content

Releases: IBM/fmwork

v1.0.4

25 Aug 14:49
13971cc
Compare
Choose a tag to compare

infer/vllm/client

  • Removed --base-url http://localhost:8000 This may require changes to
    downstream automation.

infer/vllm/process

  • Added --precision with a fp16 default value.
  • Added code to detect batch mode ('static' or 'continuous') for Spyre
    integration. This requires VLLM_SPYRE_USE_CB to be explicitly defined and
    printed in the server.log file. Note that this should be done automatically
    by the runner - server integration.
  • Changed TTFT metric from server's TTFT (via /metrics) to client's
  • Changed ITL metric from Mean TPOT to Median ITL, as reported by vLLM's
    serving benchmark.
  • To better support experiments with datasets other than random (which
    explicitly allows the definition of shapes); if such definition is not found
    in the log files (e.g., if sharegpt dataset was used), process will read
    the appropriate lines from client.log to get the average input / output
    sizes.

v1.0.3

13 Aug 06:57
87f7ae2
Compare
Choose a tag to compare

Finalized server-mode support for infer/vllm and added documentation.

v1.0.2

12 Aug 14:14
Compare
Choose a tag to compare
  • Finalize support for direct and server modes for infer/vllm, including process script.

Documentation pending — to be added momentarily.

v1.0.1

01 Aug 20:39
99d4ea6
Compare
Choose a tag to compare

General improvements to embed/tf.

  • Improved output formatting for arguments.
  • Added processing script.
  • Oh, and a README ☺️

v1.0.0

01 Aug 08:14
Compare
Choose a tag to compare

Still a partial release — but now with the latest scripts to run encoder models on CPUs / GPUs / Spyre. Subsequent releases will cover decoder models, as well as more options / different engines.

v0.1.0

30 Apr 14:14
Compare
Choose a tag to compare

Freezing working version that is currently being used for internal experimental sweeps.