RAG Development Part 2 — Chunking and Embedding Strategy for Better Retrieval
Chunking and embeddings define the floor of retrieval quality. This post covers chunk size, overlap, heading preservation, code block handling, embedding model selection, and indexing strategy.
Why Chunking Matters
When RAG quality is disappointing, many teams immediately blame the model. In practice, chunking is often the bigger factor.
Bad chunking causes:
- Important context split across chunk boundaries
- Retrieval that finds partial but unusable evidence
- Oversized chunks with too much noise
- Repetitive chunks dominating the top results
Chunking is not text splitting. It is building searchable semantic units.
How to Think About Chunk Size
There is no universal perfect number, but good starting ranges exist:
- General docs: 300 to 700 tokens
- FAQ / policy snippets: 150 to 300 tokens
- Long operational guides: 500 to 900 tokens
- Code-heavy docs: keep explanations and code together
The right chunk is the smallest unit that can still answer a meaningful question.
Chunking Strategies to Avoid
Fixed Character Splits
def bad_chunk(text: str, size: int = 1000):
return [text[i:i+size] for i in range(0, len(text), size)]
This often cuts headings away from bodies and breaks sentences in awkward places.
Ignoring Document Structure
Technical docs often rely on sections like “Installation”, “Authentication”, or “Troubleshooting”. If those boundaries disappear, retrieval quality usually drops.
A Better Chunking Sequence
In practice, a good sequence is:
- Parse structure
- Split by sections
- Split only oversized sections further
- Add overlap
- Inherit headings and metadata
Example:
def build_chunk(title: str, section_title: str, body: str) -> str:
return f"# {title}\n## {section_title}\n{body}"
That way, each chunk still carries enough context on its own.
Why Overlap Helps
Overlap reduces the chance that an important statement falls exactly on a chunk boundary.
Example:
Chunk A
- access token expiration returns 401
- refresh token can be used if still valid
Chunk B
- refresh token can be used if still valid
- new access token is returned after refresh
Too much overlap, however, increases duplication and ranking bias.
Code Blocks and Tables Need Special Care
Code Blocks
- Do not split code in the middle
- Keep explanations with examples
- Preserve function or config boundaries
Tables
- Preserve header names
- Avoid flattening them into meaningless text
- Sometimes store an additional sentence-form summary
In technical RAG systems, these structures are often critical evidence.
Choosing an Embedding Model
Useful evaluation criteria:
- Language quality
- Domain fit
- Cost
- Query latency
- Vector size
- Re-indexing overhead
What matters most is a repeatable evaluation process, not simply choosing the largest model.
Store Chunk Metadata Too
Chunk-level metadata supports filtering and citations.
{
"doc_id": "auth-guide",
"chunk_id": "auth-guide-12",
"title": "Authentication API Guide",
"section": "Token Refresh",
"language": "ko",
"updated_at": "2026-04-17T12:00:00Z"
}
Without section-level metadata, later explanations become much harder.
Indexing Strategy
A typical vector record includes:
idtextembeddingmetadata
Operationally, it is also helpful to store:
doc_idchunk_idcontent_hashembedding_version
That makes re-indexing and rollback much easier.
Document-Type-Specific Chunking
API Docs
- Endpoint-level chunks
- Keep request/response examples nearby
- Preserve error code sections
Incident Guides
- Symptoms
- Root causes
- Diagnosis steps
- Recovery procedures
FAQ
- One question + one answer per unit
Code Docs
- Keep code examples with the explanation
- Avoid making import lines standalone chunks
Common Mistakes
- Chunks too small
- Chunks too large
- Missing headings in chunk text
- No embedding version management
These are some of the most common causes of unstable retrieval.
How to Evaluate Chunking Choices
Do not pick chunking strategy by intuition.
Test variants such as:
- 300 tokens / overlap 30
- 500 tokens / overlap 50
- Section-based chunking
- Section-based chunking with heading augmentation
Compare top-3 or top-5 retrieval quality against real questions.
Closing Thoughts
Chunking and embeddings set the baseline for retrieval quality.
A strong baseline usually means:
- Preserving structure
- Keeping chunk size balanced
- Carrying headings and metadata forward
- Comparing strategies with an evaluation set
That is what makes later retrieval and generation work much better.