Saturday, September 13, 2025

Build a RAG-Based AI Chatbot with Multi-File Upload and Chroma DB (Vector DB) Integration

 Retrieval-Augmented Generation (RAG) has become a popular approach in building AI chatbots that provide accurate and context-aware answers. In this tutorial, we will build a RAG-based chatbot where users can upload multiple documents (PDF, Word, etc.) to create a custom knowledge base. When users query, the chatbot will fetch relevant context from these documents using Chroma DB (a vector database) and provide precise answers.

Watch the full step-by-step video tutorial here:
👉 YouTube Video: Build RAG-Based Chatbot with Multi-File Upload + Chroma DB


What is RAG (Retrieval-Augmented Generation)?

RAG combines:

  • Retrieval → Fetches relevant information from a knowledge base.

  • Generation → Uses an LLM (Large Language Model) like OpenAI GPT to generate context-aware answers.

This approach ensures:

  • No hallucinations (random answers).

  • Context-based responses from your documents.

Architecture Overview:
  1. User uploads documents (PDF, Word, PPT, etc.)
  2. Text extraction → Convert documents into text.
  3. Embeddings creation → Convert text into vector embeddings.
  4. Store embeddings in Chroma DB
  5. Query flow:
  • Convert user query into an embedding.
  • Fetch top N relevant chunks from Chroma DB.
  • Pass context + query to LLM for response. 


 

Tools & Libraries Used:

  • Python for backend
  • Chroma DB for vector storage
  • Hugging Face LLM
  • Hugging Face Embedding Model
  • Flask for API
Below are the code snippet for different steps. You can go through my Youtube video to the complete detail and GitHub repository link

Initialize Chroma DB and Create a Collection

import chromadb

client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection(name="docs")


Generate Embeddings:


import requests
from typing import List
import os
import logging

headers = {
    "Authorization": f"Bearer {os.getenv("HUGGINGFACEHUB_API_TOKEN")}",
    "Content-Type": "application/json"
}

logger = logging.getLogger(__name__)
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "sentence-transformers/distilbert-base-nli-mean-tokens")    

# Function to get embedding for a given text using Hugging Face API
# This function assumes that the environment variable EMBEDDING_MODEL is set to a valid Hugging Face model ID
# Example: EMBEDDING_MODEL="sentence-transformers/distilbert-base-nli-mean-tokens"

def get_embedding(text: str) -> List[float]:
    try:
        url = f"https://api-inference.huggingface.co/models/{EMBEDDING_MODEL}"
        print(f"url: {url}")
        response = requests.post(url, headers=headers, json={"inputs": text}, timeout=60, verify=False)
        response.raise_for_status()
        embedding = response.json()
        return embedding  # This is a list of floats
    except Exception as e:
        logger.error(f"Error getting embedding: {e}")
        raise ValueError("Failed to get embedding from the model.")

Store embedded data in Chroma DB:

def add_documents_to_vector_store(docs: list[str], ids: list[str], filename: str) -> None:
    try:
        vectors = [get_embedding(text) for text in docs]
        collection.add(
            documents=docs,
            embeddings=vectors,
            ids=ids,
            metadatas=[{"filename": filename} for _ in docs]
        )
    except Exception as e:
        logger.error(f"Error adding documents to vector store: {e}")
        raise ValueError(f"Failed to add documents to vector store: {e}")

Query Chroma DB:

def query_similar_documents(query: str, top_k: int = 3) -> list[str]:
    try:
        context_files = get_uploaded_files()
        print(f"Context files available: {context_files}")
        if len(context_files) == 0:
            return []
        query_vector = get_embedding(query)
        results = collection.query(
            query_embeddings=[query_vector],
            n_results=top_k,
            where={"filename": {"$in": context_files}}
        )
        return results["documents"][0]
    except Exception as e:
        logger.error(f"Error querying similar documents: {e}")
        raise ValueError(f"Failed to query similar documents: {e}")

Combine with LLM for Answer Generation:

import os
import requests
import logging
from typing import List
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import re
import chromadb
from chromadb.config import Settings
from app.core.chroma_store import query_similar_documents, add_documents_to_vector_store

from huggingface_hub import InferenceClient

logger = logging.getLogger(__name__)

HUGGINGFACE_API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")
LLM_MODEL = os.getenv("LLM_MODEL") #    mistralai/Mixtral-8x7B-Instruct-v0.1

headers = {
    "Authorization": f"Bearer {os.getenv("HUGGINGFACEHUB_API_TOKEN")}",
    "Content-Type": "application/json"
}

def build_client() -> InferenceClient:
    return InferenceClient(
        model=LLM_MODEL,
        token=HUGGINGFACE_API_TOKEN,
        timeout=300
    )

def get_llm_response(prompt: str):
    try:

        client = build_client()
        SYSTEM_PROMPT = """You are a Personal assistant specialized in providing well formated answer from the context being provided."""
        messages = [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Provide well formated answer from the given context.\n\n{prompt}"}
        ]
        result = client.chat.completions.create(
            model=LLM_MODEL,
            messages=messages,
            temperature=0.1,
            max_tokens=200
        )
        text = result.choices[0].message.content
        if isinstance(text, list):
            text = "".join(part.get("text", "") if isinstance(part, dict) else str(part) for part in text)
        return text

    except Exception as e:
        logger.error(f"Error getting LLM response: {e}")
        raise ValueError("Failed to get response from the LLM model.")


Full Video Tutorial

📌 Watch the complete step-by-step guide with implementation here:
👉 Build RAG-Based Chatbot with Multi-File Upload + Chroma DB





Thursday, September 4, 2025

Deploy AWS Lambda Using AWS CDK in Python (Step-by-Step Guide)

 If you want to learn how to deploy an AWS Lambda function using the AWS Cloud Development Kit (CDK) in Python, you’ve landed in the right place! In this post, I’ll walk you through the prerequisites, setup, and exact steps you need to follow. I’ll also share some code snippets and give a reference to my YouTube video tutorial for better understanding.

🎥 Watch the complete tutorial on YouTube here: AWS CDK Tutorial in Python | Deploy Lambda from S3




Prerequisites

Before we get started, make sure you have the following ready:

  1. Programming Language Setup – Install Python (preferably 3.10+).

  2. Free-tier AWS Account – Sign up at AWS.

  3. Node.js and NPM Installed – Required for CDK installation.

Steps to Deploy Lambda with AWS CDK

Step 1) Configure AWS CLI: Run below command on CMD and provide your AWS Access Key, Secret Key, Region and output format.

                    aws configure

Step 2) Install AWS CDK: Run below NPM command to install AWS SDK globally:

                    npm install -g aws-cdk

Step 3) Verify installation: Run the below command in CMD and if it is giving correct response then installation is complete:

                    cdk --version

Step 4) Create a New CDK Project: Go to the folder where you want to create your project and run the below command in CMD to create CDK project structure in Python

                    cdk init app --language python

Step 5) Open Project in VSCode: Navigate into your project directory and open it in Visual Studio Code. 

Step 6) Add Lambda Deployment Code: In your Stack file, add the infrastructure code as follows:

from aws_cdk import (
    Duration,
    Stack,
    aws_lambda as _lambda,
    aws_lambda,
    aws_s3 as s3,
    aws_logs as logs,
    RemovalPolicy,
    aws_iam as iam,
    aws_scheduler as scheduler,
    aws_apigateway as apigwv,
    CfnOutput
)
from constructs import Construct
from typing import cast

class Hello_Lambda_CDK_Stack(Stack):

    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        bucket_name = "jitendra-lambda-bucket"
        object_key = "mylambda.zip"

        # Define a log group for the Lambda function
        log_group = logs.LogGroup(
            self, f"HelloLambdaFN-LogGroup-ID",
            log_group_name=f"/aws/lambda/HelloLambdaFN-LogGroup",
            removal_policy=RemovalPolicy.DESTROY,
            retention=logs.RetentionDays.THREE_DAYS
        )

        # Define a Lambda function resource here
        my_lambda = _lambda.Function(
            self, f"HelloLambdaFN-ID",
            runtime=_lambda.Runtime.PYTHON_3_13,
            handler="lambda_function.lambda_handler",
            code=_lambda.Code.from_bucket(
                bucket = s3.Bucket.from_bucket_name(
                    self, f"HelloLambdaFN-Bucket-ID",
                    bucket_name
                ),
                key = object_key
            ),
            log_group=log_group,
            timeout=Duration.seconds(30),
            memory_size=128,
            function_name="HelloLambdaFN"
        )
       
        # Add permissions for the Lambda function to write to the log group
        log_group.grant_write(my_lambda)

        # Define an API Gateway HTTP API to trigger the Lambda function
        apigatewayobj = apigwv.LambdaRestApi(
            self, "HelloLambdaFN-API-Gateway",
            rest_api_name="HelloLambdaFN-API",
            handler=cast(aws_lambda.IFunction, my_lambda),
            proxy=True
        )  

        CfnOutput(
            self, "APIEndpoint",
            value=apigatewayobj.url,
            description="The endpoint URL of the HelloLambdaFN API Gateway",
        )

                
Step 7) Bootstrap AWS Environment: This prepares your AWS account to use CDK.
                
                            cdk bootstrap

Step 8) Build Your Application (Optional): For Python, explicit build isn’t needed.

Step 9) Synthesize CloudFormation Template: This generates the underlying CloudFormation template and highlight any error if there in your infrastructure code.

                            cdk synth

Step 10) Deploy the Stack: Run the below command in CMD and it will start creating your AWS infrastructure as per your code in step by step. Before final deployment, it will ask will confirmation and if you confirm then it will start final deployment:

                            Deploy the Stack

Step 11) Delete the Application: Now, if you want to delete your deployed stack, then you can delete it by running the below CDK command:

                            cdk destroy

With AWS CDK in Python, deploying a Lambda function directly from an S3 bucket is simple and efficient. By following these steps, you can manage your infrastructure as code and automate serverless deployments.

📺 Don’t forget to watch the complete tutorial video for a practical walkthrough: Click here to watch