Prompt engineering is the process of structuring or crafting an instruction in order to produce the best possible output from a generative artificial intelligence (AI) model.
[2] A prompt for a text-to-text language model can be a query, a command, or a longer statement including context, instructions, and conversation history.
Prompt engineering may involve phrasing a query, specifying a style, choice of words and grammar,[3] providing relevant context, or describing a character for the AI to mimic.
[1] When communicating with a text-to-image or a text-to-audio model, a typical prompt is a description of a desired output such as "a high-quality photo of an astronaut riding a horse"[4] or "Lo-fi slow BPM electro chill with organic samples".
[5] Prompting a text-to-image model may involve adding, removing, emphasizing, and re-ordering words to achieve a desired subject, style,[6] layout, lighting,[7] and aesthetic.
In 2018, researchers first proposed that all previously separate tasks in natural language processing (NLP) could be cast as a question-answering problem over a context.
In addition, they trained a first single, joint, multi-task model that would answer any task-related question like "What is the sentiment" or "Translate this sentence to German" or "Who is the president?
[9] After the release of ChatGPT in 2022, prompt engineering was soon seen as an important business skill, albeit one with an uncertain economic future.
According to Google Research, chain-of-thought (CoT) prompting is a technique that allows large language models (LLMs) to solve a problem as a series of intermediate steps before giving a final answer.
", Google claims that a CoT prompt might induce the LLM to answer "A: The cafeteria had 23 apples originally.
[11] It is possible to fine-tune models on CoT reasoning datasets to enhance this capability further and stimulate better interpretability.
However, according to researchers at Google and the University of Tokyo, simply appending the words "Let's think step-by-step",[21] has also proven effective, which makes CoT a zero-shot prompting technique.
OpenAI claims that this prompt allows for better scaling as a user no longer needs to formulate many specific CoT Q&A examples.
The model may output text that appears confident, though the underlying token predictions have low likelihood scores.
Research consistently demonstrates that LLMs are highly sensitive to subtle variations in prompt formatting, structure, and linguistic properties.
[3][35] Clausal syntax, for example, improves consistency and reduces uncertainty in knowledge retrieval.
[36] This sensitivity persists even with larger model sizes, additional few-shot examples, or instruction tuning.
FormatSpread facilitates systematic analysis by evaluating a range of plausible prompt formats, offering a more comprehensive performance interval.
GraphRAG[40] (coined by Microsoft Research) is a technique that extends RAG with the use of a knowledge graph (usually, LLM-generated) to allow the model to connect disparate pieces of information, synthesize insights, and holistically understand summarized semantic concepts over large data collections.
It was shown to be effective on datasets like the Violent Incident Information from News Articles (VIINA).
[41] Earlier work showed the effectiveness of using a knowledge graph for question answering using text-to-query generation.
[42] These techniques can be combined to search across both unstructured and structured data, providing expanded context, and improved ranking.
[50] A text-to-image prompt commonly includes a description of the subject of the art, the desired medium (such as digital painting or photography), style (such as hyperrealistic or pop-art), lighting (such as rim lighting or crepuscular rays), color, and texture.
[6] The Midjourney documentation encourages short, descriptive prompts: instead of "Show me a picture of lots of blooming California poppies, make them bright, vibrant orange, and draw them in an illustrated style with colored pencils", an effective prompt might be "Bright orange California poppies drawn with colored pencils".
[52] Famous artists such as Vincent van Gogh and Salvador Dalí have also been used for styling and testing.
[53] Some approaches augment or replace natural language text prompts with non-text input.
For text-to-image models, textual inversion[54] performs an optimization process to create a new word embedding based on a set of example images.
This embedding vector acts as a "pseudo-word" which can be included in a prompt to express the content or style of the examples.
During training, the tunable embeddings, input, and output tokens are concatenated into a single sequence
Prompt injection is a family of related computer security exploits carried out by getting a machine learning model (such as an LLM) which was trained to follow human-given instructions to follow instructions provided by a malicious user.