~/about_me.sh

> whoami

Rohan Sardar

Aspiring Data Scientist & ML Engineer

Designing and building end-to-end AI systems, with core expertise spanning predictive analytics, Computer Vision, NLP, and advanced generative models like LLMs and VLMs.

Python SQL R Scikit-learn PyTorch Hugging Face FastAPI LangChain Git Docker Agent Development Kit (ADK) Google Cloud PySpark

> _projects

Custom Asynchronous RAG System

Built a custom RAG pipeline from scratch including ingestion, chunking, vector DB and integrating local FlashRank reranking to maximize retrieval accuracy without added API costs. Engineered a non-blocking architecture with isolated FAISS stores and sliding-window memory for resilient, long-running conversational queries.

Transformer Architecture from Scratch

Engineered a sequence-to-sequence Transformer architecture from scratch, including custom encoder, decoder, and attention mechanisms, based on the original "Attention Is All You Need" paper. Built an end-to-end training and evaluation pipeline for automated translation on a custom English-Bengali dataset.

Multimodal Semantic Image Search

Developed a high-performance search engine utilizing OpenAI CLIP and Qdrant vector database for both natural language and image-based retrieval. Leveraged high-dimensional vector embeddings to bypass traditional keyword search, accurately matching images based on contextual meaning.

Multi-Label Toxic Comment Classifier

Developed a real-time, multi-label text classification REST API utilizing custom regex preprocessing and TF-IDF vectorization to detect toxic language. Trained and tuned balanced Logistic Regression models on the Jigsaw dataset, fully containerizing the modular application for scalable deployment.

BERT Similarity Search

Developed dual Streamlit applications within a single repository to perform semantic word searches using BERT embeddings, allowing users to visualize semantic relationships through either word clouds or interactive network graphs.

Class Activation Mapping (CAM) using ResNet18

Class Activation Maps (CAM) provide a way to identify which regions of an input image contribute most to a convolutional neural network’s decision for a specific class. CAMs highlight the spatial locations in the feature maps that are most influential for predicting class, allowing an interpretable visualization of the model’s attention. Implemented this feature for pretrained ResNet18 weights to visualize how this model actually predicts.

> _publications

Visualizing ResNet18 Attention with Class Activation Mapping (CAM) in PyTorch

Dec 2025

Class Activation Maps (CAM) provide a way to identify which regions of an input image contribute most to a convolutional neural network’s decision for a specific class. CAMs highlight the spatial locations in the feature maps that are most influential for predicting class c, allowing an interpretable visualization of the model’s attention.

Read Article

Visualizing BERT Word Embeddings with NetworkX and Plotly using PyTorch

Dec 2025

Build an interactive tool to explore the semantic connections inside a BERT model.

Read Article

Stratified K-Fold: Resampling Technique for Variance Reduction in Imbalanced Data

Mar 2026

Accurate model validation relies on the assumption that our training and testing subsets are representative of the underlying population distribution. Stratified K-Fold addresses this by ensuring that each fold preserves the class distribution of the dataset, providing a more stable and statistically reliable estimate of a model’s generalization performance.

Read Article

> _achievements

  • Kaggle Notebook Expert
  • 🏆 Rank top 1% in Kaggle Notebooks

> _certifications

  • 📜 Oracle Cloud Infrastructure 2024 Generative AI Certified Professional
  • 📜 Fundamentals of Accelerated Computing with CUDA Python from NVIDIA
  • 📜 5-Day AI Agents Intensive Course with Google from Kaggle