[ggml] Imported frequently used ggml functions into nntrainer #3429

p-debski2 · 2025-08-20T17:50:57Z

Summary

imported GGML functions into nntr_ggml_impl directory and used those instead of the actual GGML in the ggml_interface functions
removed GGML includes from ggml_interface

Signed-off-by: p-debski2 [email protected]

jijoongmoon · 2025-08-25T23:46:30Z

I think it is OK to include the code to remove the ggml submodule.

gkisalapl

Please make it one commit PR

djeong20

This patch would not work correctly for all platforms.
For instance, you need to modify the Android.mk file under jni and test/jni. There are still ggml dependencies left.

For instance,

nntrainer/jni/Android.mk.in

Lines 95 to 131 in bce948a

    
           MESON_HAS_GGML := @MESON_HAS_GGML@ 
        
           ifeq ($(MESON_HAS_GGML),1) 
        
           LOCAL_MODULE        := ggml 
        
           LOCAL_SRC_FILES     := @MESON_GGML_ROOT@/src/ggml-backend.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu/ggml-cpu-hbm.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu/unary-ops.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu/vec.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu/ggml-cpu-traits.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu/llamafile/sgemm.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu/ops.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu/amx/mmq.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu/amx/amx.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu/binary-ops.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu/ggml-cpu-aarch64.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu/cpu-feats-x86.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-backend-reg.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-opt.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/gguf.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-threading.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-alloc.c \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-quants.c \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu/ggml-cpu.cpp \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu/ggml-cpu_c.c \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu/ggml-cpu-quants.c \ 
        
                                  @MESON_GGML_ROOT@/src/ggml.c 
        
           LOCAL_CXXFLAGS += -std=c++17 -O3 -fexceptions 
        
           LOCAL_C_INCLUDES    := @MESON_GGML_ROOT@/include \ 
        
                                  @MESON_GGML_ROOT@/src \ 
        
                                  @MESON_GGML_ROOT@/src/ggml-cpu 
        
           LOCAL_EXPORT_C_INCLUDES  := $(LOCAL_C_INCLUDES) 
        
           include $(BUILD_SHARED_LIBRARY) 
        
           endif # MESON_HAS_GGML

nntrainer/test/jni/Android.mk

Lines 57 to 62 in bce948a

    
           include $(CLEAR_VARS) 
        
           LOCAL_MODULE := ggml 
        
           LOCAL_SRC_FILES := $(NNTRAINER_ROOT)/builddir/jni/$(TARGET_ARCH_ABI)//libggml.so 
        
           include $(PREBUILT_SHARED_LIBRARY)

djeong20 · 2025-09-04T13:52:53Z

Additionally, I don't think the enable_ggml option is needed anymore. Let's remove the meson option and the ENABLE_GGML macro as well.

djeong20 · 2025-09-04T14:00:00Z

packaging/nntrainer.spec

@@ -442,11 +436,6 @@ ln -sf %{_libdir}/pkgconfig/capi-nnstreamer.pc %{_libdir}/pkgconfig/capi-ml-comm
 # Setup Ruy
 tar -xf packaging/ruy.tar.gz -C subprojects

-# Setup GGML
-%if 0%{?enable_ggml}
-tar -xf packaging/ggml.tar.gz -C subprojects


Please delete the packaging/ggml.tar.gz file as well.

Removed GGML as a dependency and introduced the 'nntr_ggml_impl' directory with implementation of necessary functions Signed-off-by: p-debski2 <[email protected]>

p-debski2 · 2025-09-05T15:03:20Z

All PR comments have been addressed, and I believe it should work correctly for all platforms now.

However, I keep having trouble with Ubuntu Meson tests on CI after removing the enable-ggml flag. Building process is correct, but when it comes to launching unit tests, it always times-out on the cpu-backend one.

Locally it works fine for me, and different builds with similar configurations also pass the unit tests phase (see Ubuntu pdebuild & Ubuntu Meson with Clang). It's also concerning that for those workflows that pass the CI, this test usually takes ~20s to finish, while here it times out in 60s.

Removed the enable-ggml option from Meson and use of ENABLE_GGML macro from code according to PR suggestions Signed-off-by: p-debski2 <[email protected]>

p-debski2 · 2025-09-08T08:16:41Z

I've investigated the issue with timeouts on Ubuntu Meson locally, and ran a series of tests on different configurations before and after changes made by this PR.

Here's the results from unit tests as they were ran on CI before this PR (note that GGML is disabled in this configuration):

debskip@AMDN6357:~/repos/nntrainer$ meson test -C builddir/ --suite unittests
ninja: Entering directory `/home/debskip/repos/nntrainer/builddir'
ninja: no work to do.
 1/28 nntrainer:unittests / unittest_tizen_capi_layer                OK               0.07s
 2/28 nntrainer:unittests / unittest_nntrainer_exe_order             OK               0.07s
 3/28 nntrainer:unittests / unittest_base_properties                 OK               0.07s
 4/28 nntrainer:unittests / unittest_nntrainer_appcontext            OK               0.07s
 5/28 nntrainer:unittests / unittest_nntrainer_quantizer             OK               0.08s
 6/28 nntrainer:unittests / unittest_util_func                       OK               0.08s
 7/28 nntrainer:unittests / unittest_common_properties               OK               0.08s
 8/28 nntrainer:unittests / unittest_tizen_capi_dataset              OK               0.10s
 9/28 nntrainer:unittests / unittest_nntrainer_activations           OK               0.11s
10/28 nntrainer:unittests / unittest_nntrainer_internal              OK               0.11s
11/28 nntrainer:unittests / unittest_tizen_capi_lr_scheduler         OK               0.12s
12/28 nntrainer:unittests / unittest_nntrainer_lr_scheduler          OK               0.12s
13/28 nntrainer:unittests / unittest_nntrainer_lazy_tensor           OK               0.18s
14/28 nntrainer:unittests / unittest_tizen_capi_optimizer            OK               0.19s
15/28 nntrainer:unittests / integration_tests                        OK               0.19s
16/28 nntrainer:unittests / unittest_nntrainer_cpu_backend           OK               0.25s
17/28 nntrainer:unittests / unittest_nntrainer_tensor_pool           OK               0.24s
18/28 nntrainer:unittests / unittest_compiler                        OK               0.38s
19/28 nntrainer:unittests / unittest_layers                          OK               0.40s
20/28 nntrainer:unittests / unittest_nntrainer_graph                 OK               0.74s
21/28 nntrainer:unittests / unittest_nntrainer_models                OK               0.90s
22/28 nntrainer:unittests / unittest_memory                          OK               0.98s
23/28 nntrainer:unittests / unittest_nntrainer_modelfile             OK               1.12s
24/28 nntrainer:unittests / unittest_models                          OK               1.43s
25/28 nntrainer:unittests / unittest_nntrainer_tensor                OK               1.75s
26/28 nntrainer:unittests / unittest_datasets                        OK               1.88s
27/28 nntrainer:unittests / unittest_tizen_capi                      OK               3.22s
28/28 nntrainer:unittests / unittest_ccapi                           OK               4.82s


Ok:                 28
Expected Fail:      0
Fail:               0
Unexpected Pass:    0
Skipped:            0
Timeout:            0

Here's the results from the same commit before my changes, but this time with GGML enabled:

debskip@AMDN6357:~/repos/nntrainer$ meson test -C builddir/ --suite unittests
ninja: Entering directory `/home/debskip/repos/nntrainer/builddir'
ninja: no work to do.
 1/28 nntrainer:unittests / unittest_tizen_capi_optimizer            OK               0.42s
 2/28 nntrainer:unittests / unittest_tizen_capi_lr_scheduler         OK               0.42s
 3/28 nntrainer:unittests / unittest_tizen_capi_dataset              OK               0.43s
 4/28 nntrainer:unittests / unittest_nntrainer_activations           OK               0.43s
 5/28 nntrainer:unittests / unittest_nntrainer_exe_order             OK               0.43s
 6/28 nntrainer:unittests / unittest_nntrainer_internal              OK               0.43s
 7/28 nntrainer:unittests / unittest_nntrainer_lazy_tensor           OK               0.43s
 8/28 nntrainer:unittests / unittest_nntrainer_quantizer             OK               0.42s
 9/28 nntrainer:unittests / unittest_util_func                       OK               0.42s
10/28 nntrainer:unittests / unittest_nntrainer_graph                 OK               0.42s
11/28 nntrainer:unittests / unittest_base_properties                 OK               0.40s
12/28 nntrainer:unittests / unittest_common_properties               OK               0.36s
13/28 nntrainer:unittests / unittest_nntrainer_tensor_pool           OK               0.32s
14/28 nntrainer:unittests / unittest_nntrainer_lr_scheduler          OK               0.29s
15/28 nntrainer:unittests / unittest_nntrainer_appcontext            OK               0.26s
16/28 nntrainer:unittests / unittest_tizen_capi_layer                OK               0.44s
17/28 nntrainer:unittests / unittest_compiler                        OK               0.20s
18/28 nntrainer:unittests / integration_tests                        OK               0.12s
19/28 nntrainer:unittests / unittest_layers                          OK               0.31s
20/28 nntrainer:unittests / unittest_nntrainer_models                OK               0.71s
21/28 nntrainer:unittests / unittest_memory                          OK               0.54s
22/28 nntrainer:unittests / unittest_nntrainer_modelfile             OK               1.09s
23/28 nntrainer:unittests / unittest_models                          OK               1.15s
24/28 nntrainer:unittests / unittest_datasets                        OK               1.97s
25/28 nntrainer:unittests / unittest_nntrainer_tensor                OK               2.63s
26/28 nntrainer:unittests / unittest_tizen_capi                      OK               3.44s
27/28 nntrainer:unittests / unittest_ccapi                           OK               5.73s
28/28 nntrainer:unittests / unittest_nntrainer_cpu_backend           OK               7.12s


Ok:                 28
Expected Fail:      0
Fail:               0
Unexpected Pass:    0
Skipped:            0
Timeout:            0

And finally, the default configuration with changes from this PR:

debskip@AMDN6357:~/repos/nntrainer$ meson test -C builddir/ --suite unittests
ninja: Entering directory `/home/debskip/repos/nntrainer/builddir'
ninja: no work to do.
 1/28 nntrainer:unittests / unittest_tizen_capi_layer                OK               0.08s
 2/28 nntrainer:unittests / unittest_util_func                       OK               0.07s
 3/28 nntrainer:unittests / unittest_nntrainer_internal              OK               0.07s
 4/28 nntrainer:unittests / unittest_nntrainer_quantizer             OK               0.07s
 5/28 nntrainer:unittests / unittest_tizen_capi_lr_scheduler         OK               0.09s
 6/28 nntrainer:unittests / unittest_nntrainer_lr_scheduler          OK               0.07s
 7/28 nntrainer:unittests / unittest_nntrainer_lazy_tensor           OK               0.09s
 8/28 nntrainer:unittests / unittest_nntrainer_exe_order             OK               0.10s
 9/28 nntrainer:unittests / unittest_common_properties               OK               0.09s
10/28 nntrainer:unittests / unittest_base_properties                 OK               0.11s
11/28 nntrainer:unittests / unittest_nntrainer_tensor_pool           OK               0.15s
12/28 nntrainer:unittests / unittest_tizen_capi_dataset              OK               0.17s
13/28 nntrainer:unittests / unittest_tizen_capi_optimizer            OK               0.17s
14/28 nntrainer:unittests / unittest_nntrainer_activations           OK               0.17s
15/28 nntrainer:unittests / unittest_nntrainer_appcontext            OK               0.15s
16/28 nntrainer:unittests / integration_tests                        OK               0.23s
17/28 nntrainer:unittests / unittest_compiler                        OK               0.34s
18/28 nntrainer:unittests / unittest_layers                          OK               0.43s
19/28 nntrainer:unittests / unittest_nntrainer_graph                 OK               0.79s
20/28 nntrainer:unittests / unittest_memory                          OK               0.94s
21/28 nntrainer:unittests / unittest_nntrainer_models                OK               1.01s
22/28 nntrainer:unittests / unittest_nntrainer_modelfile             OK               1.05s
23/28 nntrainer:unittests / unittest_models                          OK               1.29s
24/28 nntrainer:unittests / unittest_datasets                        OK               1.83s
25/28 nntrainer:unittests / unittest_nntrainer_tensor                OK               2.83s
26/28 nntrainer:unittests / unittest_tizen_capi                      OK               3.35s
27/28 nntrainer:unittests / unittest_ccapi                           OK               5.35s
28/28 nntrainer:unittests / unittest_nntrainer_cpu_backend           OK               7.48s


Ok:                 28
Expected Fail:      0
Fail:               0
Unexpected Pass:    0
Skipped:            0
Timeout:            0

We can see that the last two configurations with GGML enabled give very similar results in terms of time, but they also run longer then the first one, which was previously ran on CI for Ubuntu Meson workflow.

This behavior seems to be the expected one. Maybe Github Actions agents don't have enough resources to run the tests in 60s, so I tried raising the test-timeout value from 60s to 90s, and now it passes with ~78s on the longest running test case. If you're not okay with this modification, please let me know and I'll undo it, but I don't know how we can solve this in any other way.

EunjuYang

Could you add dequantize_row_q4_0 as well?

Added dequantize_row_q4_0 and dequantize_row_q8_0 to the nntr_ggml_impl Signed-off-by: p-debski2 <[email protected]>

EunjuYang · 2025-09-12T02:00:43Z

I checked this PR is compatible with our recent applications, CausalLM (Linux & Android).

EunjuYang

LGTM

github-actions bot added the Need Review label Aug 22, 2025

p-debski2 force-pushed the work/ggml-integration branch 4 times, most recently from 07adfd2 to d54ccb7 Compare August 25, 2025 15:19

p-debski2 marked this pull request as ready for review August 25, 2025 15:30

p-debski2 requested review from myungjoo, jijoongmoon, again4you, jaeyun-jung, leemgs, wooksong, gichan-jang, anyj0527, lhs8928, songgot, jihochu, DonghakPark, SeoHyungjun, baek2sm, skykongkong8, djeong20, EunjuYang, dkjung and haehun as code owners August 25, 2025 15:30

p-debski2 force-pushed the work/ggml-integration branch from ad3b72a to b95b19b Compare August 26, 2025 13:11

skykongkong8 mentioned this pull request Aug 29, 2025

Pr 3447+rmsnorm for pr #3449

Closed

gkisalapl reviewed Sep 4, 2025

View reviewed changes

p-debski2 force-pushed the work/ggml-integration branch from b95b19b to c2e18b2 Compare September 4, 2025 08:42

p-debski2 force-pushed the work/ggml-integration branch 2 times, most recently from 9ad293e to af4b4e0 Compare September 4, 2025 12:02

gkisalapl approved these changes Sep 4, 2025

View reviewed changes

djeong20 requested changes Sep 4, 2025

View reviewed changes

djeong20 reviewed Sep 4, 2025

View reviewed changes

Removed GGML from nntrainer

b3f033d

Removed GGML as a dependency and introduced the 'nntr_ggml_impl' directory with implementation of necessary functions Signed-off-by: p-debski2 <[email protected]>

p-debski2 force-pushed the work/ggml-integration branch 5 times, most recently from 065a679 to d175017 Compare September 5, 2025 13:08

p-debski2 force-pushed the work/ggml-integration branch from d175017 to 0e88783 Compare September 8, 2025 07:15

Removed ggml-related build options

d4809fd

Removed the enable-ggml option from Meson and use of ENABLE_GGML macro from code according to PR suggestions Signed-off-by: p-debski2 <[email protected]>

p-debski2 force-pushed the work/ggml-integration branch from 0e88783 to d4809fd Compare September 8, 2025 07:50

EunjuYang reviewed Sep 11, 2025

View reviewed changes

Added more dequantization functions

c9791db

Added dequantize_row_q4_0 and dequantize_row_q8_0 to the nntr_ggml_impl Signed-off-by: p-debski2 <[email protected]>

EunjuYang approved these changes Sep 12, 2025

View reviewed changes

djeong20 added the rebase required label Sep 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ggml] Imported frequently used ggml functions into nntrainer #3429

[ggml] Imported frequently used ggml functions into nntrainer #3429

Uh oh!

p-debski2 commented Aug 20, 2025

Uh oh!

jijoongmoon commented Aug 25, 2025

Uh oh!

gkisalapl left a comment

Uh oh!

djeong20 left a comment •

edited

Loading

Uh oh!

djeong20 commented Sep 4, 2025

Uh oh!

djeong20 Sep 4, 2025

Uh oh!

p-debski2 commented Sep 5, 2025 •

edited

Loading

Uh oh!

p-debski2 commented Sep 8, 2025

Uh oh!

EunjuYang left a comment

Uh oh!

EunjuYang commented Sep 12, 2025 •

edited

Loading

Uh oh!

EunjuYang left a comment

Uh oh!

Uh oh!

	MESON_HAS_GGML := @MESON_HAS_GGML@

	ifeq ($(MESON_HAS_GGML),1)

	LOCAL_MODULE := ggml
	LOCAL_SRC_FILES := @MESON_GGML_ROOT@/src/ggml-backend.cpp \
	@MESON_GGML_ROOT@/src/ggml-cpu/ggml-cpu-hbm.cpp \
	@MESON_GGML_ROOT@/src/ggml-cpu/unary-ops.cpp \
	@MESON_GGML_ROOT@/src/ggml-cpu/vec.cpp \
	@MESON_GGML_ROOT@/src/ggml-cpu/ggml-cpu-traits.cpp \
	@MESON_GGML_ROOT@/src/ggml-cpu/llamafile/sgemm.cpp \
	@MESON_GGML_ROOT@/src/ggml-cpu/ops.cpp \
	@MESON_GGML_ROOT@/src/ggml-cpu/amx/mmq.cpp \
	@MESON_GGML_ROOT@/src/ggml-cpu/amx/amx.cpp \
	@MESON_GGML_ROOT@/src/ggml-cpu/binary-ops.cpp \
	@MESON_GGML_ROOT@/src/ggml-cpu/ggml-cpu-aarch64.cpp \
	@MESON_GGML_ROOT@/src/ggml-cpu/cpu-feats-x86.cpp \
	@MESON_GGML_ROOT@/src/ggml-backend-reg.cpp \
	@MESON_GGML_ROOT@/src/ggml-opt.cpp \
	@MESON_GGML_ROOT@/src/gguf.cpp \
	@MESON_GGML_ROOT@/src/ggml-threading.cpp \
	@MESON_GGML_ROOT@/src/ggml-alloc.c \
	@MESON_GGML_ROOT@/src/ggml-quants.c \
	@MESON_GGML_ROOT@/src/ggml-cpu/ggml-cpu.cpp \
	@MESON_GGML_ROOT@/src/ggml-cpu/ggml-cpu_c.c \
	@MESON_GGML_ROOT@/src/ggml-cpu/ggml-cpu-quants.c \
	@MESON_GGML_ROOT@/src/ggml.c
	LOCAL_CXXFLAGS += -std=c++17 -O3 -fexceptions
	LOCAL_C_INCLUDES := @MESON_GGML_ROOT@/include \
	@MESON_GGML_ROOT@/src \
	@MESON_GGML_ROOT@/src/ggml-cpu

	LOCAL_EXPORT_C_INCLUDES := $(LOCAL_C_INCLUDES)

	include $(BUILD_SHARED_LIBRARY)

	endif # MESON_HAS_GGML

	include $(CLEAR_VARS)

	LOCAL_MODULE := ggml
	LOCAL_SRC_FILES := $(NNTRAINER_ROOT)/builddir/jni/$(TARGET_ARCH_ABI)//libggml.so

	include $(PREBUILT_SHARED_LIBRARY)

[ggml] Imported frequently used ggml functions into nntrainer #3429

Are you sure you want to change the base?

[ggml] Imported frequently used ggml functions into nntrainer #3429

Uh oh!

Conversation

p-debski2 commented Aug 20, 2025

Summary

Uh oh!

jijoongmoon commented Aug 25, 2025

Uh oh!

gkisalapl left a comment

Choose a reason for hiding this comment

Uh oh!

djeong20 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

djeong20 commented Sep 4, 2025

Uh oh!

djeong20 Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

p-debski2 commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

p-debski2 commented Sep 8, 2025

Uh oh!

EunjuYang left a comment

Choose a reason for hiding this comment

Uh oh!

EunjuYang commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EunjuYang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

djeong20 left a comment •

edited

Loading

p-debski2 commented Sep 5, 2025 •

edited

Loading

EunjuYang commented Sep 12, 2025 •

edited

Loading