Desk2Mob

Desk2Mob

Desk2Mob

Building a Managed RAG Platform with Amazon Bedrock

Building a Managed RAG Platform with Amazon Bedrock

Amazon Bedrock provides managed services that simplify the implementation of Retrieval-Augmented Generation systems. Instead of building chunking, embeddings, retrieval, and orchestration from scratch, organizations can use Knowledge Bases for Amazon Bedrock with managed foundation models.

Key AWS Services

  • Amazon S3
  • Amazon Bedrock
  • Knowledge Bases for Amazon Bedrock
  • Amazon OpenSearch Serverless, commonly used as the managed vector store
  • Optional vector stores such as Aurora PostgreSQL Serverless, Amazon S3 Vectors, Neptune Analytics, or a supported existing vector store
  • Amazon Textract
  • AWS Lambda
  • Amazon ECS
  • Amazon API Gateway
  • Amazon CloudWatch

Reference Architecture

PDF Upload
    |
    v
Amazon S3
    |
    v
Knowledge Base for Amazon Bedrock
    |
    +--> Chunking
    |
    +--> Embeddings
    |
    +--> Vector Storage
    |
    v
OpenSearch Serverless

User Question
    |
    v
Application API
    |
    v
RetrieveAndGenerate API
    |
    v
Foundation Model
    |
    v
Response with Citations

Document Ingestion Workflow

PDF
 |
 v
S3
 |
 v
Knowledge Base Sync
 |
 v
Text Extraction
 |
 v
Chunk Creation
 |
 v
Embedding Generation
 |
 v
Vector Index Storage

Large PDF Processing

For simple text-based PDFs, Knowledge Bases for Amazon Bedrock can ingest and process the document from S3. For scanned PDFs, table-heavy documents, diagrams, or visually rich content, add a preprocessing step such as Amazon Textract or an appropriate Bedrock parsing option before indexing. The extracted text, metadata, and page references should then be stored and synced into the knowledge base.

PDF
 |
 v
Textract
 |
 v
Structured Text
 |
 v
Knowledge Base
 |
 v
Vector Database

Question Answering Flow

User Question
      |
      v
RetrieveAndGenerate API
      |
      v
Knowledge Base Search
      |
      v
Top Chunks
      |
      v
Claude / Nova / Llama
      |
      v
Generated Response

Retrieve vs RetrieveAndGenerate

  • Use Retrieve when your application wants only the matching source chunks and will build the prompt itself.
  • Use RetrieveAndGenerate when you want Amazon Bedrock to retrieve relevant chunks, call the foundation model, and return a generated answer with source references.
  • Use RetrieveAndGenerateStream when the chat UI should stream the answer back to the user.

Example Retrieval Request Flow

POST /chat

{
  "question": "Explain the disaster recovery strategy."
}

Internal Processing

1. Receive user question
2. Retrieve relevant chunks from the knowledge base
3. Optionally rerank the retrieved chunks
4. Build the grounded prompt
5. Invoke the selected foundation model
6. Return answer with source references

Advantages of Bedrock-Managed RAG

  • Minimal infrastructure management
  • Fully managed embeddings
  • Built-in retrieval workflows
  • Enterprise-grade security
  • AWS IAM integration
  • Simplified scaling
  • Reduced operational burden

Security Architecture

  • S3 encryption
  • KMS-managed keys
  • IAM access controls
  • VPC endpoints
  • CloudTrail auditing
  • Private networking

Production Best Practices

  • Store metadata with every chunk.
  • Include page numbers.
  • Use guardrails for user input and generated responses, but remember that retrieved source references still need separate data governance controls.
  • Use document versioning.
  • Enable citations.
  • Monitor retrieval quality.
  • Track latency and token usage.

Ideal Use Cases

  • Enterprise document search
  • Policy assistants
  • Internal knowledge bases
  • Compliance documentation
  • Customer support systems
  • Employee self-service assistants

Posted on June 08, 2026 by Amit Pandya in AWS, AI, RAG


All Posts