R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models
Date:
[Slides]
A Python toolkit designed to streamline the evaluation of different RAG workflows in conjunction with LLMs. Our toolkit, which supports popular built-in RAG workflows and allows for the incorporation of customized testing data on the specific domain. * User-friendly: R-Eval provides easy-to-use scripts for running and analysing experiments with the given models and datasets automatically. * Modular: R-Eval is designed to be modular, which allows users to easily extend the framework with new models, datasets, and analysis tools. * Extensibility: The domain-agnostic design of R-Eval makes it easy to evaluate Retrieval Augmented Large Language Models on new domain based on our framework.