Neo4j GraphRAG with GNN+LLM

Knowledge graph retrieval to improve multi-hop Q&A performance, optimized with GNN + LLM models.

This repo contains experiments for combining Knowledge Graph Retrieval with GNN+LLM models to improve RAG. Currently leveraging Neo4j, G-Retriever, and the STaRK-Prime dataset for benchmarking.

This work was presented at: stanford graph learning workshop 2024: https://snap.stanford.edu/graphlearning-workshop-2024/ nvidia technical blog: https://developer.nvidia.com/blog/boosting-qa-accuracy-with-graphrag-using-pyg-and-graph-databases/

Architecture Overview

RAG on large knowledge graphs that require multi-hop retrieval and reasoning, beyond node classification and link prediction.
General, extensible 2-part architecture: KG Retrieval & GNN+LLM.
Efficient, stable inference time and output for real-world use cases.

Installation

Installing Neo4J

Install the Neo4j database (and relevant JDK) by following official instructions.

Once installed, you can verify the installation version

neo4j --version

Then start your Neo4J instance via

neo4j start

Directories in use:
home:         /var/lib/neo4j
config:       /etc/neo4j             <-- location of config file
logs:         /var/log/neo4j
plugins:      /var/lib/neo4j/plugins <-- location of plugins in neo4j home
import:       /var/lib/neo4j/import
data:         /var/lib/neo4j/data
certificates: /var/lib/neo4j/certificates
licenses:     /var/lib/neo4j/licenses
run:          /var/lib/neo4j/run
Starting Neo4j.

From this output, you can see information about where Neo4J has been installed:

Installing Additional Plugins

You'll also need to install the following:

This can be done by moving the <plugin>.jar files from the products/ directory into the plugins/ directory in your Neo4J home. Also add the following line

dbms.security.procedures.allowlist=gds.*

To the bottom of your Neo4J config file (/etc/neo4j/neo4j.conf) and restart Neo4J.

neo4j restart

The database & dataset

With the database installed and running, you can load the STaRK-Prime dataset by running the python notebook in data-loading/stark_prime_neo4j_loading.ipynb.

Alternatives:

Run the notebook as a script

python load_data.py

This was done by converting the notebook into a script.

jupyter nbconvert --to script stark_prime_neo4j_loading.ipynb

Obtain a database dump at AWS S3 (bucket at gds-public-dataset/stark-prime-neo4j523) for database version 5.23.

Other requirements

Install all required libraries in requirements.txt with pip install -r requirements.txt.

They should be compatible with Python 3.11.

Populate db.env file with your local Neo4j URI and username and password. Additionally, make sure huggingface-cli authentications are set up for using relevant (Llama2, Llama3) models.

Reproduce results

To train a model with default configurations, run the following command: python train.py --checkpointing --llama_version llama3.1-8b --retrieval_config_version 0 --algo_config_version 0 --g_retriever_config_version 0 --eval_batch_size 4
To get result for Pipline, run eval_pcst_ordering.ipynb on using the intermediate dataset and g-retriever model.
To exactly reproduce results in the below table, use the stanford-workshop-2024 branch. The main branch contains new incremental changes and improvements.

Additional Neo4j GraphRAG Resources

For a high-level overview of Neo4j & GenAI, have a look at neo4j.com/genai.
To learn how to get started using LLMs with Neo4j see this online Graph Academy course which is one of many Neo4j-GenAI courses covering multiple topics ranging from KG construction, to graph+vector search, and building GenAI chatbot applications.
Pick your GenAI framework of choice to start building your own GenAI applications with Neo4j.
Check out Neo4j GenAI technical blogs for other worked examples and integrations.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
configs		configs
data-loading		data-loading
.env.template		.env.template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
STaRKQADataset.py		STaRKQADataset.py
STaRKQADatasetGDS.py		STaRKQADatasetGDS.py
STaRKQAVectorSearchDataset.py		STaRKQAVectorSearchDataset.py
architecture.png		architecture.png
compute_metrics.py		compute_metrics.py
compute_pcst.py		compute_pcst.py
db.env		db.env
eval_pcst_ordering.ipynb		eval_pcst_ordering.ipynb
finalmetric.png		finalmetric.png
main.py		main.py
plot_pr.py		plot_pr.py
plot_results.py		plot_results.py
plotpr.png		plotpr.png
requirements.txt		requirements.txt
retrieve-prime-subgraphs.ipynb		retrieve-prime-subgraphs.ipynb
setup_db_env.py		setup_db_env.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Neo4j GraphRAG with GNN+LLM

Architecture Overview

Installation

Installing Neo4J

Installing Additional Plugins

The database & dataset

Other requirements

Reproduce results

Additional Neo4j GraphRAG Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

neo4j-product-examples/neo4j-gnn-llm-example

Folders and files

Latest commit

History

Repository files navigation

Neo4j GraphRAG with GNN+LLM

Architecture Overview

Installation

Installing Neo4J

Installing Additional Plugins

The database & dataset

Other requirements

Reproduce results

Additional Neo4j GraphRAG Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages