Skip to content

neo4j-product-examples/neo4j-gnn-llm-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neo4j GraphRAG with GNN+LLM

Knowledge graph retrieval to improve multi-hop Q&A performance, optimized with GNN + LLM models.

This repo contains experiments for combining Knowledge Graph Retrieval with GNN+LLM models to improve RAG. Currently leveraging Neo4j, G-Retriever, and the STaRK-Prime dataset for benchmarking.

This work was presented at: stanford graph learning workshop 2024: https://snap.stanford.edu/graphlearning-workshop-2024/ nvidia technical blog: https://developer.nvidia.com/blog/boosting-qa-accuracy-with-graphrag-using-pyg-and-graph-databases/

Architecture Overview

Architecture

  • RAG on large knowledge graphs that require multi-hop retrieval and reasoning, beyond node classification and link prediction.
  • General, extensible 2-part architecture: KG Retrieval & GNN+LLM.
  • Efficient, stable inference time and output for real-world use cases.

Installation

Installing Neo4J

Install the Neo4j database (and relevant JDK) by following official instructions.

Once installed, you can verify the installation version

neo4j --version

Then start your Neo4J instance via

neo4j start
Directories in use:
home:         /var/lib/neo4j
config:       /etc/neo4j             <-- location of config file
logs:         /var/log/neo4j
plugins:      /var/lib/neo4j/plugins <-- location of plugins in neo4j home
import:       /var/lib/neo4j/import
data:         /var/lib/neo4j/data
certificates: /var/lib/neo4j/certificates
licenses:     /var/lib/neo4j/licenses
run:          /var/lib/neo4j/run
Starting Neo4j.

From this output, you can see information about where Neo4J has been installed:

Installing Additional Plugins

You'll also need to install the following:

This can be done by moving the <plugin>.jar files from the products/ directory into the plugins/ directory in your Neo4J home. Also add the following line

dbms.security.procedures.allowlist=gds.*

To the bottom of your Neo4J config file (/etc/neo4j/neo4j.conf) and restart Neo4J.

neo4j restart

The database & dataset

With the database installed and running, you can load the STaRK-Prime dataset by running the python notebook in data-loading/stark_prime_neo4j_loading.ipynb.

Alternatives:

  • Run the notebook as a script
python load_data.py

This was done by converting the notebook into a script.

jupyter nbconvert --to script stark_prime_neo4j_loading.ipynb
  • Obtain a database dump at AWS S3 (bucket at gds-public-dataset/stark-prime-neo4j523) for database version 5.23.

Other requirements

Install all required libraries in requirements.txt with pip install -r requirements.txt.

They should be compatible with Python 3.11.

Populate db.env file with your local Neo4j URI and username and password. Additionally, make sure huggingface-cli authentications are set up for using relevant (Llama2, Llama3) models.

Reproduce results

  1. To train a model with default configurations, run the following command: python train.py --checkpointing --llama_version llama3.1-8b --retrieval_config_version 0 --algo_config_version 0 --g_retriever_config_version 0 --eval_batch_size 4
  2. To get result for Pipline, run eval_pcst_ordering.ipynb on using the intermediate dataset and g-retriever model.
  3. To exactly reproduce results in the below table, use the stanford-workshop-2024 branch. The main branch contains new incremental changes and improvements.

Table Description

Additional Neo4j GraphRAG Resources

About

GraphRAG on Neo4j by finetuning GNN+LLM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5