- Published on
【译】用于构建高级RAG的速查表和一些配方
简介
这是一份全面的RAG速查表,详细介绍了RAG的动机以及超越基本或朴素RAG构建的技术和策略。(高分辨率版本)
上面分享的RAG速查表受到了最近一篇RAG调查论文的极大启发(“Retrieval-Augmented Generation for Large Language Models: A Survey” Gao, Yunfan, et al. 2023)。
基本RAG
如今定义的主流RAG涉及从外部知识数据库中检索文档,并将这些文档与用户的查询一起传递给LLM以生成响应。换句话说,RAG包括检索组件、外部知识数据库和生成组件。
LlamaIndex 基本 RAG 配方:
from llama_index import SimpleDirectoryReader, VectorStoreIndex
# load data
documents = SimpleDirectoryReader(input_dir="...").load_data()
# build VectorStoreIndex that takes care of chunking documents
# and encoding chunks to embeddings for future retrieval
index = VectorStoreIndex.from_documents(documents=documents)
# The QueryEngine class is equipped with the generator
# and facilitates the retrieval and generation steps
query_engine = index.as_query_engine()
# Use your Default RAG
response = query_engine.query("A user's query")
RAG 的成功要求
为了使 RAG 系统被视为成功(即提供对用户问题有用和相关的答案),实际上只有两个高级别的要求:
- 检索必须能够找到与用户查询最相关的文档。
- 生成必须能够充分利用检索到的文档来充分回答用户的查询。
高级 RAG
有了成功要求的定义,我们可以说构建高级 RAG 实际上是关于应用更复杂的技术和策略(对检索或生成组件)以确保它们最终得到满足。此外,我们可以将复杂的技术归类为独立地解决两个高级成功要求中的一个(或多或少),或者同时解决这两个要求的技术。
高级检索技术必须能够找到与用户查询最相关的文档
以下我们简要描述了一些更复杂的技术,以帮助实现第一个成功要求。
- 块大小优化: 由于 LLM 受上下文长度限制,在构建外部知识数据库时需要对文档进行分块。过大或过小的块可能会对生成组件造成问题,导致不准确的响应。
LlamaIndex 块大小优化配方(笔记本指南):
from llama_index import ServiceContext
from llama_index.param_tuner.base import ParamTuner, RunResult
from llama_index.evaluation import SemanticSimilarityEvaluator, BatchEvalRunner
### Recipe
### Perform hyperparameter tuning as in traditional ML via grid-search
### 1. Define an objective function that ranks different parameter combos
### 2. Build ParamTuner object
### 3. Execute hyperparameter tuning with ParamTuner.tune()
# 1. Define objective function
def objective_function(params_dict):
chunk_size = params_dict["chunk_size"]
docs = params_dict["docs"]
top_k = params_dict["top_k"]
eval_qs = params_dict["eval_qs"]
ref_response_strs = params_dict["ref_response_strs"]
# build RAG pipeline
index = _build_index(chunk_size, docs) # helper function not shown here
query_engine = index.as_query_engine(similarity_top_k=top_k)
# perform inference with RAG pipeline on a provided questions `eval_qs`
pred_response_objs = get_responses(
eval_qs, query_engine, show_progress=True
)
# perform evaluations of predictions by comparing them to reference
# responses `ref_response_strs`
evaluator = SemanticSimilarityEvaluator(...)
eval_batch_runner = BatchEvalRunner(
{"semantic_similarity": evaluator}, workers=2, show_progress=True
)
eval_results = eval_batch_runner.evaluate_responses(
eval_qs, responses=pred_response_objs, reference=ref_response_strs
)
# get semantic similarity metric
mean_score = np.array(
[r.score for r in eval_results["semantic_similarity"]]
).mean()
return RunResult(score=mean_score, params=params_dict)
# 2. Build ParamTuner object
param_dict = {"chunk_size": [256, 512, 1024]} # params/values to search over
fixed_param_dict = { # fixed hyperparams
"top_k": 2,
"docs": docs,
"eval_qs": eval_qs[:10],
"ref_response_strs": ref_response_strs[:10],
}
param_tuner = ParamTuner(
param_fn=objective_function,
param_dict=param_dict,
fixed_param_dict=fixed_param_dict,
show_progress=True,
)
# 3. Execute hyperparameter search
results = param_tuner.tune()
best_result = results.best_run_result
best_chunk_size = results.best_run_result.params["chunk_size"]
2. 结构化外部知识: 在复杂情况下,可能需要以比基本向量索引更为结构化的方式构建外部知识,以允许在处理明智分离的外部知识源时进行递归检索或路由检索。
LlamaIndex 递归检索配方(笔记本指南):
from llama_index import SimpleDirectoryReader, VectorStoreIndex
from llama_index.node_parser import SentenceSplitter
from llama_index.schema import IndexNode
### Recipe
### Build a recursive retriever that retrieves using small chunks
### but passes associated larger chunks to the generation stage
# load data
documents = SimpleDirectoryReader(
input_file="some_data_path/llama2.pdf"
).load_data()
# build parent chunks via NodeParser
node_parser = SentenceSplitter(chunk_size=1024)
base_nodes = node_parser.get_nodes_from_documents(documents)
# define smaller child chunks
sub_chunk_sizes = [256, 512]
sub_node_parsers = [
SentenceSplitter(chunk_size=c, chunk_overlap=20) for c in sub_chunk_sizes
]
all_nodes = []
for base_node in base_nodes:
for n in sub_node_parsers:
sub_nodes = n.get_nodes_from_documents([base_node])
sub_inodes = [
IndexNode.from_text_node(sn, base_node.node_id) for sn in sub_nodes
]
all_nodes.extend(sub_inodes)
# also add original node to node
original_node = IndexNode.from_text_node(base_node, base_node.node_id)
all_nodes.append(original_node)
# define a VectorStoreIndex with all of the nodes
vector_index_chunk = VectorStoreIndex(
all_nodes, service_context=service_context
)
vector_retriever_chunk = vector_index_chunk.as_retriever(similarity_top_k=2)
# build RecursiveRetriever
all_nodes_dict = {n.node_id: n for n in all_nodes}
retriever_chunk = RecursiveRetriever(
"vector",
retriever_dict={"vector": vector_retriever_chunk},
node_dict=all_nodes_dict,
verbose=True,
)
# build RetrieverQueryEngine using recursive_retriever
query_engine_chunk = RetrieverQueryEngine.from_args(
retriever_chunk, service_context=service_context
)
# perform inference with advanced RAG (i.e. query engine)
response = query_engine_chunk.query(
"Can you tell me about the key concepts for safety finetuning"
)
其他有用的链接
我们有几篇指南演示了应用其他高级技术来确保在复杂情况下准确检索。以下是其中一些选择性链接:
生成的高级技术必须能够充分利用检索到的文档
与前一节类似,我们提供了一些属于这一类别的复杂技术的示例,可以描述为确保检索到的文档与生成器的LLM良好对齐。
- 信息压缩: LLM不仅受到上下文长度的限制,而且如果检索到的文档携带太多噪音(即无关信息),可能会导致响应降级。
LlamaIndex信息压缩配方 (笔记本指南):
from llama_index import SimpleDirectoryReader, VectorStoreIndex
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.postprocessor import LongLLMLinguaPostprocessor
### Recipe
### Define a Postprocessor object, here LongLLMLinguaPostprocessor
### Build QueryEngine that uses this Postprocessor on retrieved docs
# Define Postprocessor
node_postprocessor = LongLLMLinguaPostprocessor(
instruction_str="Given the context, please answer the final question",
target_token=300,
rank_method="longllmlingua",
additional_compress_kwargs={
"condition_compare": True,
"condition_in_question": "after",
"context_budget": "+100",
"reorder_context": "sort", # enable document reorder
},
)
# Define VectorStoreIndex
documents = SimpleDirectoryReader(input_dir="...").load_data()
index = VectorStoreIndex.from_documents(documents)
# Define QueryEngine
retriever = index.as_retriever(similarity_top_k=2)
retriever_query_engine = RetrieverQueryEngine.from_args(
retriever, node_postprocessors=[node_postprocessor]
)
# Used your advanced RAG
response = retriever_query_engine.query("A user query")
2. 结果重新排名: LLM受到所谓的“中间迷失”现象的影响,即LLM专注于提示的极端端点。基于此,重新排名检索到的文档在传递给生成组件之前是有益的。
LlamaIndex重新排名以获得更好生成配方 (笔记本指南):
import os
from llama_index import SimpleDirectoryReader, VectorStoreIndex
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.postprocessor import LongLLMLinguaPostprocessor
### Recipe
### Define a Postprocessor object, here CohereRerank
### Build QueryEngine that uses this Postprocessor on retrieved docs
# Build CohereRerank post retrieval processor
api_key = os.environ["COHERE_API_KEY"]
cohere_rerank = CohereRerank(api_key=api_key, top_n=2)
# Build QueryEngine (RAG) using the post processor
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(documents=documents)
query_engine = index.as_query_engine(
similarity_top_k=10,
node_postprocessors=[cohere_rerank],
)
# Use your advanced RAG
response = query_engine.query(
"What did Sam Altman do in this essay?"
)
同时解决检索和生成成功要求的高级技术
在这个小节中,我们考虑使用检索和生成的协同作用来实现更好的检索以及对用户查询更准确的生成响应的复杂方法。
- 生成增强检索: 这些技术利用LLM固有的推理能力,在执行检索之前对用户查询进行细化,以更好地指示它需要提供有用响应的内容。
LlamaIndex生成增强检索配方(笔记本指南):
from llama_index.llms import OpenAI
from llama_index.query_engine import FLAREInstructQueryEngine
from llama_index import (
VectorStoreIndex,
SimpleDirectoryReader,
ServiceContext,
)
### Recipe
### Build a FLAREInstructQueryEngine which has the generator LLM play
### a more active role in retrieval by prompting it to elicit retrieval
### instructions on what it needs to answer the user query.
# Build FLAREInstructQueryEngine
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
index = VectorStoreIndex.from_documents(documents)
index_query_engine = index.as_query_engine(similarity_top_k=2)
service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-4"))
flare_query_engine = FLAREInstructQueryEngine(
query_engine=index_query_engine,
service_context=service_context,
max_iterations=7,
verbose=True,
)
# Use your advanced RAG
response = flare_query_engine.query(
"Can you tell me about the author's trajectory in the startup world?"
)
2. 迭代检索-生成RAG: 对于一些复杂情况,可能需要多步推理来对用户查询提供有用和相关的答案。
LlamaIndex迭代检索-生成配方(笔记本指南):
from llama_index.query_engine import RetryQueryEngine
from llama_index.evaluation import RelevancyEvaluator
### Recipe
### Build a RetryQueryEngine which performs retrieval-generation cycles
### until it either achieves a passing evaluation or a max number of
### cycles has been reached
# Build RetryQueryEngine
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
index = VectorStoreIndex.from_documents(documents)
base_query_engine = index.as_query_engine()
query_response_evaluator = RelevancyEvaluator() # evaluator to critique
# retrieval-generation cycles
retry_query_engine = RetryQueryEngine(
base_query_engine, query_response_evaluator
)
# Use your advanced rag
retry_response = retry_query_engine.query("A user query")
RAG的测量方面
评估RAG系统当然是至关重要的。在他们的调查论文中,高云帆等人指出了附加RAG速查表右上部分所见的7个测量方面。llama-index库包括几个评估抽象以及与RAGAs的集成,以帮助构建者了解他们的RAG系统在这些测量方面的视角下达到成功要求的程度。下面,我们列出了一些评估笔记本指南。
原文:https://blog.llamaindex.ai/a-cheat-sheet-and-some-recipes-for-building-advanced-rag-803a9d94c41b