Skip to content

NVIDIA NeMo Curator 0.8.0

Compare
Choose a tag to compare
@ryantwolf ryantwolf released this 09 May 01:11
cf12d34
  • Llama Based PII Redaction
  • Trafilatura Text Extractor
  • Chinese & Japanese Stopwords for Text Extractors
  • Writing gzip compressed jsonl datasets
  • Training dataset curation for retriever customization using hard-negative mining
  • Implemented a memory efficient pairwise similarity in Semantic Deduplication