How it works
RAG = Retrieval-Augmented Generation. Instead of relying on the model's training data (which can be wrong or stale), you store your own documents as vectors, retrieve the relevant ones at query time, and pass them to Claude as context.
Three AWS services. No hallucination. Fully auditable — every answer traces back to a document you ingested.
Setup guide
Go to RDS → Create database and choose:
- Engine: Aurora (PostgreSQL-compatible)
- Version: 15.4 or later (pgvector is pre-installed)
- Template: Dev/Test
- Instance class: db.t3.medium
- Initial database name:
ragdb
Or with the AWS CLI:
aws rds create-db-cluster \
--db-cluster-identifier rag-cluster \
--engine aurora-postgresql \
--engine-version 15.4 \
--master-username postgres \
--master-user-password YOUR_PASSWORD \
--database-name ragdb \
--region us-east-1
aws rds create-db-instance \
--db-instance-identifier rag-cluster-instance \
--db-cluster-identifier rag-cluster \
--engine aurora-postgresql \
--db-instance-class db.t3.medium \
--region us-east-1
Once the cluster is available, connect and run setup.sql:
psql -h YOUR_CLUSTER_ENDPOINT -U postgres -d ragdb -f setup.sql
-- setup.sql
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE IF NOT EXISTS docs (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding vector(1024)
);
CREATE INDEX IF NOT EXISTS docs_emb_idx
ON docs USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 50);
- Go to Amazon Bedrock → Model access
- Click Manage model access
- Enable Titan Embeddings V2 and Claude Sonnet
- Click Save changes — access is instant
Your execution environment needs bedrock:InvokeModel on the two models:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["bedrock:InvokeModel"],
"Resource": [
"arn:aws:bedrock:*::foundation-model/amazon.titan-embed-text-v2:0",
"arn:aws:bedrock:*::foundation-model/anthropic.claude-sonnet-4-6"
]
}]
}
For local dev, run aws configure with a user that has this policy.
pip install -r requirements.txt
boto3>=1.34.0
psycopg2-binary>=2.9.9
export DB_HOST=your-cluster.cluster-xxxx.us-east-1.rds.amazonaws.com
export DB_NAME=ragdb
export DB_USER=postgres
export DB_PASSWORD=your-password
export AWS_REGION=us-east-1
Find your cluster endpoint: RDS → Clusters → your cluster → Endpoints → Writer endpoint
python rag_app.py
============================================================
RAG App: Amazon Bedrock + Aurora PostgreSQL (pgvector)
============================================================
[1/3] Setting up Aurora pgvector...
Aurora pgvector ready.
[2/3] Ingesting Oracle migration docs...
Ingested 5 documents into Aurora pgvector.
[3/3] Asking: What is the difference between Oracle sequences
and Aurora PostgreSQL sequences?
------------------------------------------------------------
Answer:
------------------------------------------------------------
In Oracle, you call sequences using .NEXTVAL FROM DUAL.
In Aurora PostgreSQL, use SELECT nextval('seq_name').
No more DUAL table. You can also use SERIAL or
GENERATED AS IDENTITY for auto-increment columns.
------------------------------------------------------------
The code
Three functions. That's the entire RAG pipeline.
embed() — convert text to a vector
def embed(text: str) -> List[float]:
resp = bedrock.invoke_model(
modelId="amazon.titan-embed-text-v2:0",
body=json.dumps({"inputText": text, "dimensions": 1024}),
)
return json.loads(resp["body"].read())["embedding"]
ingest() — embed and store documents
def ingest(documents: List[str]):
with get_db() as conn:
with conn.cursor() as cur:
for doc in documents:
vec = embed(doc)
cur.execute(
"INSERT INTO docs (content, embedding) VALUES (%s, %s::vector)",
(doc, vec)
)
conn.commit()
ask() — retrieve context, generate answer
def ask(question: str) -> str:
# 1. Cosine similarity search in Aurora
q_vec = embed(question)
with get_db() as conn:
with conn.cursor() as cur:
cur.execute("""
SELECT content FROM docs
ORDER BY embedding <=> %s::vector
LIMIT 3
""", (q_vec,))
chunks = [row[0] for row in cur.fetchall()]
# 2. Pass retrieved docs as context to Claude
context = "\n\n---\n\n".join(chunks)
resp = bedrock.invoke_model(
modelId="anthropic.claude-sonnet-4-6",
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 512,
"messages": [{"role": "user", "content":
f"Answer using ONLY this context:\n\n{context}\n\nQuestion: {question}"
}]
})
)
return json.loads(resp["body"].read())["content"][0]["text"]
Customising for your own docs
Replace ORACLE_MIGRATION_DOCS with your content:
MY_DOCS = [
"Your first document...",
"Your second document...",
]
ingest(MY_DOCS)
To load from files:
import os
docs = []
for fname in os.listdir("./my-docs"):
with open(f"./my-docs/{fname}") as f:
docs.append(f.read())
ingest(docs)
Cost estimate (dev/demo)
| Service | Usage | Approx cost |
|---|---|---|
| Aurora PostgreSQL db.t3.medium | 8 hrs/day | ~$1.50/day |
| Bedrock Titan Embeddings V2 | 5 docs + 10 queries | < $0.01 |
| Bedrock Claude Sonnet | 10 queries | < $0.05 |
Common errors
| Error | Fix |
|---|---|
| could not connect to server | Check DB_HOST env var and security group — inbound port 5432 must allow your IP |
| AccessDeniedException | Enable model access in Bedrock console (Step 2 above) |
| could not open extension control file "vector" | Upgrade Aurora to PostgreSQL 15.2+ |
| could not load library "vector" | Same — pgvector requires Aurora PostgreSQL 15.2+ |