Retrieval-augmented generation

[1] Use cases include providing chatbot access to internal company data or giving factual information only from an authoritative source.

[1] Typically, the data to be referenced is converted into LLM embeddings, numerical representations in the form of large vectors.

[1] The model feeds this relevant retrieved information into the LLM via prompt engineering of the user's original query.

[2] Newer implementations (as of 2023[update]) can also incorporate specific augmentation modules with abilities such as expanding queries into multiple domains and using memory and self-improvement to learn from previous retrievals.

[5] Some models incorporate extra steps to improve output, such as the re-ranking of retrieved information, context selection, and fine-tuning.

By redesigning the language model with the retriever in mind, a 25-time smaller network can get comparable perplexity as its much larger counterparts.

One can start with a set of documents, books, or other bodies of text, and convert them to a knowledge graph using one of many methods, including language models.

Overview of RAG process, combining external documents and user input into an LLM prompt to get tailored output