Frontmatter the agent reads.

Every article in 10-cortex/ opens with a small set of YAML fields and ends with a typed list of sources. cortex-compile writes them; cortex-lint enforces them. This is the standard.

20/Required fields

Nine fields. Each one earns its place by changing how the LLM picks, ranks, or refreshes the article.

Field	Type	Why it's there
title	string	The display name. cortex-lint flags articles missing it as errors.
description	string	Two sentences. What this is, why the vault owner cares. Used in 10-cortex/_index.md and the harness's preview pane.
topic	slug	Routes the article into 10-cortex/<topic>/. cortex-compile uses this to plan merges vs new articles.
created	YYYY-MM-DD	First write date. Never updated.
last_compiled	YYYY-MM-DD	Last time cortex-compile ran on this article. Auto-updated.
verified_at	YYYY-MM-DD	Last time you confirmed the article still matches reality. cortex-lint flags articles where this is older than 90 days.
confidence	low \| medium \| high	high = primary research / official docs. medium = credible analysis. low = single source / speculation.
staleness_signal	string	A one-line condition that, if true, means the article is stale. cortex-lint best-effort matches this against world knowledge.
sources	list	Plain list of source paths. The body Sources section adds typed-edge prefixes (see below).

30/Typed-edge sources

The Sources section uses five wikilink prefixes so the relationship between article and source is explicit. cortex-lint flags untyped sources as warnings.

supports::

The source's findings back up the article's claims. Most common.

contradicts::

The source disputes part of the article. Useful when you want to remember the disagreement.

extends::

The source builds on the article -- newer or deeper material.

mentions::

Passing reference. The article isn't the source's main subject but the source noted it.

inspired-by::

The article exists because of this source. Often a seed thought from 40-raw/plain/.

40/Example article

What a clean cortex article looks like end-to-end.

---
title: "Retrieval-augmented generation"
description: "Grounding LLM responses in your own documents. The standard architecture for memory-aware agents."
topic: "ai"
sources:
  - "40-raw/youtube/rag-explained.md"
  - "40-raw/papers/lewis-et-al-2020-rag.md"
created: 2026-04-12
last_compiled: 2026-05-08
verified_at: 2026-05-08
confidence: high
staleness_signal: "RAG architecture moves to graph-RAG by default, or vector DBs are replaced by long-context models"
---

# Retrieval-augmented generation

LLMs answer better when they retrieve relevant documents first.

## TL;DR

RAG pairs an LLM with a retriever that fetches relevant chunks from your own
corpus before generation. The model's response is grounded in real documents
instead of pure parametric memory, which reduces hallucination and lets you
update knowledge without retraining. Cost: retrieval quality is now your
bottleneck.

## Summary

[2-3 paragraphs.]

## Key Facts

- Lewis et al. (2020) introduced the term, pairing BART with a dense retriever.
- Modern stacks pick top-k chunks via embedding similarity, then concatenate.
- Retrieval quality dominates response quality once the LLM is good enough.

## Connections

- Related: [[10-cortex/ai/embeddings]], [[10-cortex/ai/long-context]]
- Used in: vault-side memory loop (cortex-compile + cortex-connect)
- Contrasts with: [[10-cortex/ai/finetuning]] -- finetuning bakes knowledge in;
  RAG keeps it swappable.

## Sources

- supports:: [[40-raw/papers/lewis-et-al-2020-rag.md]] -- original architecture description
- extends:: [[40-raw/youtube/rag-explained.md]] -- modern stack walkthrough (top-k, reranking, hybrid search)

50/cortex-lint rules

Read-only diagnostic. Run /cortex-lint to surface decay and drift across 10-cortex/. Never mutates a file; the report is for you to read and act on.

Rule	Severity	What it catches
missing-tldr	error	Article body has no `## TL;DR` heading. Block re-compile until fixed.
missing-sources	error	Article body has no `## Sources` heading or the section has no items.
missing-field:*	error / warn	Required frontmatter fields. title and topic are errors; the rest are warnings.
untyped-source	warn	Sources list item that doesn't start with one of the 5 typed-edge prefixes.
tldr-out-of-range	warn	TL;DR word count is below 30 or above 150. Target is 50-100.
stale-verified-at	info	verified_at is older than 90 days. Surface for review, not blocking.
staleness-signal-triggered	info	Best-effort match -- the LLM noticed the staleness_signal mentions something it knows has changed.