Fix tests and add multi-target output for boostings #1353

dmitryglhf · 2024-12-27T17:54:57Z

This is a 🐛 bug fix.

Summary

Failing test handling

FAILED AssertionError: assert 13 == ((5 * 2) - 1):
assert mm_pipeline.length == mm_pipeline.depth * len(mm_data) - 1 # minus final ensemble works well with length and depth of default initial assumption (scaling + rf). When the graph length increased, this assert does not work.
FAILED Currently only multi-regression, multilabel and survival objectives work with multidimensional target.:
Because of convert_to_dataframe method in boosting implementations, when trying to ravel array with multi-target (length of arrays 'train' and 'target' doesnt match). And catboost default parameter "loss_function" is Logloss instead of MultiLogloss (or MultiRMSE in regression tasks).
FAILED test #0 contains class label "0" that is not present in the learn dataset:
Happens when we try to make train_test_setup and the number of rows is too small.
FAILED ValueError: Length of values (5) does not match length of index (6)
In test_correct_api_dataset_with_pseudo_text_preprocessing length of features-array doesnt match with length of target-array.

Context

Fixes #1350

pep8speaks · 2024-12-27T17:55:04Z

Hello @dmitryglhf! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2025-01-08 17:01:25 UTC

nicl-nno · 2024-12-27T22:25:23Z

test/unit/api/test_assumption_builder.py

+    # TODO: do we need this assert?
+    # assert mm_pipeline.length == mm_pipeline.depth * len(mm_data) - 1  # minus final ensemble


Это проверка именно для мульти-модального варианта. Для новых начальных приближений она не проходит?

Да, именно так. В случае проверки для мульти-модального варианта для нового начального приближения строится следующий пайплайн:

То есть для текущего начального приближения scaling->rf с length = 2 и depth = 2 после преобразования получится length = 7 (х2, +2 узла данных и +1 узел ансамбля) и depth = 4.

И, следовательно, проверка mm_pipeline.length == mm_pipeline.depth * len(mm_data) - 1 сработает, т.е.
7 == 4*2 - 1 -> 7 == 7.

Но при увеличении количества параллельных узлов, эта формула перестает работать.

Ок, тогда можно модифицировать проверку. В целом кажется можно её загрубить до mm_pipeline.length > len(mm_data)

nicl-nno · 2024-12-27T22:25:49Z

test/unit/preprocessing/test_preprocessing_through_api.py

-             data_with_spaces_and_nans_in_features, data_with_nans_in_target_column,
-             data_with_nans_in_multi_target]
+             data_with_spaces_and_nans_in_features, data_with_nans_in_target_column,]
+             # TODO: how deal with multi-target in xgboost and lightgbm?


Не очень понятен вопрос - как и везде.

Наверное, правильнее было бы создать issue, а не PR.

В конкретно этом месте не предлагается какого-либо решения. Здесь комментарий был добавлен, чтобы локализовать случай, когда в начальном пайплайне присутствуют xgboost или lgbm и стоит задача с мульти-таргетом. Тест не проходит, потому что, при конвертации класса input_data в датафрейм происходит развертывание массива с таргетом (т.е., к примеру, в обычной задаче был массив с таргетом размером 3х1, а в задаче с мульти-таргетом он имеет размер 3х2, поэтому после вызова np.ravel() в методе convert_to_dataframe он преобразуется в 1-d массив с длиной 3х2=6, следовательно, появится ошибка, связанная с длиной массива).

# вырезка из метода convert_to_dataframe if copied_input_data.target is not None and copied_input_data.target.size > 0: rows_len = dataframe.shape[0] target = copied_input_data.target[:rows_len] dataframe['target'] = np.ravel(target)

Этот случай я сейчас пытаюсь обработать.

Касаемо этого вопроса №3: FAILED test #0 contains class label "0" that is not present in the learn dataset.
В тесте в качестве датасета с признаками выступает небольшая таблица features. При разбивке её на train и eval внутри FEDOT через train_test_data_setup упускается часть классов, поэтому, после обучения мы встречаем неизвестный до этого класс "0".

def data_with_categorical_target(with_nan: bool = False): """ Generate dataset for classification task where target column is defined as string categories (e.g. 'red', 'green'). Dataset is generated so that when split into training and test in the test sample in the target will always be a new category. :param with_nan: is there a need to generate target column with np.nan """ task = Task(TaskTypesEnum.classification) features = np.array([[0, 0], [0, 1], [8, 8], [8, 9]]) if with_nan: target = np.array(['blue', np.nan, np.nan, 'di'], dtype=object) else: target = np.array(['blue', 'da', 'ba', 'di'], dtype=str) train_input = InputData(idx=np.array([0, 1, 2, 3]), features=features, target=target, task=task, data_type=DataTypesEnum.table, supplementary_data=SupplementaryData()) return train_input

train_input, eval_input = train_test_data_setup(input_data) X_train, y_train = self.convert_to_dataframe( train_input, identify_cats=self.params.get('enable_categorical') ) X_eval, y_eval = self.convert_to_dataframe( eval_input, identify_cats=self.params.get('enable_categorical') )

Предлагаемое решение: удвоить в тесте test_categorical_target_processed_correctly в методе получения данных data_with_categorical_target массивы features и target. В таком случае ошибка не появляется и тест проходит.

Касаемо вопроса №4: FAILED ValueError: Length of values (5) does not match length of index (6).
В тесте массив features имеет количество строк = 6, а длина массива target = 5.

Предлагаемое решение: укоротить features на 1. Если так сделать - тест проходит, ошибка не появляется.

nicl-nno · 2024-12-28T07:59:51Z

Вот это надо поправить. А так вроде норм.

dmitryglhf · 2025-01-07T13:35:50Z

/fix-pep8

codecov · 2025-01-07T13:50:11Z

Codecov Report

Attention: Patch coverage is 50.76923% with 32 lines in your changes missing coverage. Please review.

Project coverage is 80.18%. Comparing base (926990e) to head (afb70da).
Report is 19 commits behind head on master.

Files with missing lines	Patch %	Lines
...mplementations/models/boostings_implementations.py	52.54%	28 Missing ⚠️
fedot/core/operations/evaluation/boostings.py	33.33%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1353      +/-   ##
==========================================
- Coverage   80.32%   80.18%   -0.15%     
==========================================
  Files         146      146              
  Lines       10470    10491      +21     
==========================================
+ Hits         8410     8412       +2     
- Misses       2060     2079      +19

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dmitryglhf · 2025-01-08T11:47:37Z

/fix-pep8

dmitryglhf added 2 commits December 27, 2024 20:16

Update test_preprocessing_through_api.py

d27d963

Update test_assumption_builder.py

5b42067

dmitryglhf requested a review from nicl-nno December 27, 2024 17:54

nicl-nno reviewed Dec 27, 2024

View reviewed changes

Added multi-output for boostings, tests fix

f81503e

github-actions bot and others added 2 commits January 7, 2025 13:36

Automated autopep8 fixes

8fcc017

Multi-modal test update

785c5f7

dmitryglhf requested a review from nicl-nno January 8, 2025 11:44

dmitryglhf added 2 commits January 8, 2025 14:58

Fix pep8

b6d56c7

Update test_preprocessing_through_api.py

31310b5

nicl-nno approved these changes Jan 8, 2025

View reviewed changes

dmitryglhf changed the title ~~Fix tests~~ Fix tests and add multi-target output for boostings Jan 8, 2025

Update test_preprocessing_through_api.py

afb70da

dmitryglhf merged commit 538f1ba into aimclub:master Jan 8, 2025
4 of 5 checks passed

dmitryglhf deleted the fix_tests branch January 9, 2025 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix tests and add multi-target output for boostings #1353

Fix tests and add multi-target output for boostings #1353

Uh oh!

dmitryglhf commented Dec 27, 2024

Uh oh!

pep8speaks commented Dec 27, 2024 •

edited

Loading

Uh oh!

nicl-nno Dec 27, 2024

Uh oh!

dmitryglhf Dec 28, 2024 •

edited

Loading

Uh oh!

nicl-nno Dec 28, 2024

Uh oh!

nicl-nno Dec 27, 2024

Uh oh!

dmitryglhf Dec 28, 2024

Uh oh!

nicl-nno commented Dec 28, 2024

Uh oh!

dmitryglhf commented Jan 7, 2025

Uh oh!

codecov bot commented Jan 7, 2025 •

edited

Loading

Uh oh!

dmitryglhf commented Jan 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# TODO: do we need this assert?
		# assert mm_pipeline.length == mm_pipeline.depth * len(mm_data) - 1 # minus final ensemble

Fix tests and add multi-target output for boostings #1353

Fix tests and add multi-target output for boostings #1353

Uh oh!

Conversation

dmitryglhf commented Dec 27, 2024

Summary

Context

Uh oh!

pep8speaks commented Dec 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2025-01-08 17:01:25 UTC

Uh oh!

nicl-nno Dec 27, 2024

Choose a reason for hiding this comment

Uh oh!

dmitryglhf Dec 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicl-nno Dec 28, 2024

Choose a reason for hiding this comment

Uh oh!

nicl-nno Dec 27, 2024

Choose a reason for hiding this comment

Uh oh!

dmitryglhf Dec 28, 2024

Choose a reason for hiding this comment

Uh oh!

nicl-nno commented Dec 28, 2024

Uh oh!

dmitryglhf commented Jan 7, 2025

Uh oh!

codecov bot commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dmitryglhf commented Jan 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pep8speaks commented Dec 27, 2024 •

edited

Loading

dmitryglhf Dec 28, 2024 •

edited

Loading

codecov bot commented Jan 7, 2025 •

edited

Loading