Generate source code mutants for python using a LLM.
-
Clone the git repository:
git clone https://github.com/robert-oleynik/AI4SE-Mutation-Testing mutator
-
Install this tool to your system:
pip install -e ./mutator
Note: You may want to run this and the following steps inside a
venv. -
Test the tool is installed successfully:
mutator --help mutator-runner --helpNote: The second command is required to execute the test and is installed as part of this project.
This tool provides two main capabilities:
- Generating and testing mutants for a software project.
- Collect datasets from repositories and LoRa-finetuning on these datasets.
Both expect projects with following structure:
src/contains all source filestests/contains all test files
To generate mutants, this tool will follow the following steps:
- Select all functions provided in the source files.
- Remove all function not matching the filter expression. We will refer to these functions as source functions.
- Generate mutants with all generator and generator config combinations.
- Mark all duplicate mutants.
- Run tests for all unique mutants.
Generators are functions, which transform source functions into mutants or mutated functions with the usage of LLMs. The generators used in this project can be divided into two groups:
-
Generating mutants based on function signature and some extra information:
- The first and most basic generator is the
docstringgenerator. This generator will prompt the LLM only with the signature and docstring. - The second generator
comment_rewritewill additionally write the original function commented out before the member function signature. - The
comment_rewrite_no_contextgenerator is a variation of the previous generator, that does not provide context for the surrounding class.
- The first and most basic generator is the
-
Reusing parts of the existing implementation and prompting the LLM to rewrite parts of it:
- The
prefixgenerator will arbitrarily cut off the source function after a random token and prompt the LLM with this result. - The
infillinggenerator will work similarly using the same context. But instead of cutting of tokens, it relies on TreeSitter to sample expressions or statements of the source function and prompt the LLM to regenerate them.
- The
While using different prompts is one approach for receiving different mutants, we can also modify the arguments passed to the LLMs generation method. We will refer to these arguments as generator configs. In addition to the LLM arguments these configs also contains the number of retries/repetitions a generator is supposed to do. Therefore, we provide following configs:
multi_sampleprompts the LLM to return multiple return sequences with sampling enabled.beam_searchprompts the LLM to return multiple return sequences with beam search and sampling enabled.- For each of the above, two additional configs with suffixes
_coldand_hold, that set the temperature lower and higher, respectively.
Each config will produce the same number of mutants - 8 per source function.
To generate mutants, we will first need a project we want to test. For this, we will use the Flask repository:
git clone https://github.com/pallets/flask
cd flaskNow we can use our tool on this repository to generate mutants.
We will start with the infilling generator with the multi_sample.
These can be set by the -g/--generator and -c/--config respectively.
mutator generate --generator infilling --config multi_sample --filter "flask.app:Flask.*"
Note:
We use a simple globing syntax to filter with functions. This is necessary as the generation testing may take some time.This globing reuses following syntax:
- Each function is converted into construct following a syntax like
<module path>:[<class name>.]<function_name>- Each specified filter is converted into a RegEx by escaping
.and replacing*with.*. In addition, these filter allow a!in the beginning to mark negative filter.- We will match all functions against these filter and exclude all functions matching negative filters.
Some other important flags for generating mutants:
-o/--out-dirChange the directory to write the mutants to.--cleanRemoves all old mutants.-m/--modelChange the LLM model to use. Note: this may cause compatibility issues.
To execute the test suite for each mutant we generate we can use the mutator test command.
It is important to note that each mutant has a timeout of 60s this value can be changed by
using the --timeout flag.
Like mutator generate the -o/--out-dir can be used to change mutants work directory.
The results of this test run can be viewed with mutator inspect.
This will open a TUI application showing the mutant as a diff and some additional
information including test output.
Use the list on the left to select a source function, the horizontal arrow keys to
select a mutant and ctrl + the horizontal arrow keys to cycle through the LLM's output stages.
Like mutator generate and mutator test the -o/--out-dir can be used to change
mutants work directory.
In some context, it is beneficial to use fine-tuning for improving LLM results. To do so, this tool provides some utilities from collecting datasets to training the model.
As we already discussed we have multiple approaches for generating prompts. Because of this, generating datasets need to account for the different input formats.
In the following we will generate a dataset from flask repository. To do so, we will use a bare git repository of flask.
git clone --bare https://github.com/pallets/flaskTo collect samples from this repository, we can use mutator collect as following:
mutator collect --bare ./flask.git --generator "infilling"This will collect mutants from the source files and format these for the infilling
generator. While it is possible to specify multiple generators, it is not recommended.
In addition, it is also possible to specify multiple bare git repositories.
It is also possible to use normal git repositories using the --repository flag
instead.
These samples can be filtered by using following:
--max-dlocMaximum change in lines of code (LOC) from source to mutant.--max-loc-ratioMaximum ratio between source and mutant LOC (>1means the mutant has more lines the source).--min-prompt-locMinimum LOC for the generated prompt.--max-prompt-locMaximum LOC for the generated prompt.
It is recommended to use these options in combination with --update.
This flag will load the dataset at <out_dir>/data, apply the limits and store
the result in <out_dir>/data-updated.
Using this command looks like:
mutator collect --update --max-dloc 10We can use the generated dataset with the following sub command:
mutator train --dataset "<collect_out_dir>/data-updated"For more details on the available arguments see mutator train --help.
In addition, it is worthy to note, that the directory specified with
--dataset is the data or data-updated subdirectory of the output
directory specified with collect.
To evaluate the loss per training sample use the following command:
mutator train-result --dir out/model --dataset out/dataset/data-updatedThis section contains tools used and the recommended project setup for development. It is not mandatory for running the tool.
Required tools:
Install required dependencies:
# Create a venv
python3 -m venv venv
# Activate venv
source ./venv/bin/activate
# Install package dependencies
pip install -e .