I'm also a national level swimmer and won 4 gold, 5 silver and 11 bronze medals across 4 years at the Inter-IIT Aquatics Meet for which I was awarded the Institute Sports Color in 2016 and Hostel Color in 2017. I led my team in 2018-19 and was awarded the Institute Sports Citation in 2019 for being the best outgoing swimmer. I'm an NTSE 2013 and KVPY 2013 scholar.
Sandboxing of untrusted language procedures within RDS PostgreSQL
The goal was to facilitate sandboxed execution of untrusted functions, stored procedures and subtransactions inside Docker & LXC containers. This was to prevent customer programs from crashing the main database server instance and enforce finer access privileges while maintaining average transaction latency at par with equivalent extensions running locally on PostgreSQL
Integrated an open-source extension PL/Container into PostgreSQL 12 & 13 using shared memory segments, grpc channels & Unix sockets for communication between customer and container backend
Performed extensive benchmarking with different container configurations to identify bottlenecks like usage of separate Docker containers for every customer session leading to frequent spawning of containers
Developed a new prototype leveraging a single container across all customer sessions, leading to a 63% reduction in memory usage while scaling to thousands of customer connections
Added runtime support for Go language, and created 3 separate extensions for each of R, Python and Go utilizing separate containers for better isolation
Received a Full Time Offer in recognition of the above work
Hardware Acceleration of Proxies
The aim was to decrease the latency incurred at the level of proxies running as host-based processes
Worked on Layer 4 and Layer 7 load balancing of different proxies like Envoy, Nginx & HAproxy to demarcate functionalities for host and SmartNIC offloading
Performed benchmarking experiments using wrk2 to determine feasibility of SSL certificate verification offloading and scalability, with a detailed study of Envoy worker threads
Characteristics-Tailored Summary Generation
Unlike typical abstractive text summarisation, the aim was to tune summaries to characteristics like being more formal as required by news agencies or focus on certain financial aspects as desired by corporate organisations
Adapted Facebook AI Research's Convolutional seq2seq model for translation to topic-tuned summary generation with modified attention weights to focus on specific input embeddings
Altered beam search paradigm for tweaking decoder state probability distributions, thus enhancing word-level features like descriptiveness with token-based learning for length based summarisation
Incorporated a Reinforcement Learning term in loss function and achieved a 6.4% increase in ROUGE scores
Implemented the above insights on pointer-generator framework and submitted a patent application (P8322-US) for the same at United States Patent and Trademark Office
Received a Full Time Offer in recognition of the above work
Domain-specific Customer Care Chatbot
Modern chatbots perform well in conversations comprising simple question-answer pairs. The aim was to develop a semantic control algorithm to track context switches to predict favourable next steps in the conversation
Designed a chatbot leveraging word2vec, Latent Semantic Indexing and Latent Dirichlet Allocation for topics relevant to user query with tf-idf weighted word n-grams for improving accuracy
Incorporated probabilistic finite automata to model conversation state changes guided by sentiment scores
Built an emotion classifier SVM, and ontologies for knowledge representation from RDF sources with SPARQL queries for fetching data
Recently, research efforts have gained pace to cater to varied user preferences while generating text summaries. While there have been attempts to incorporate a few handpicked characteristics such as length or entities, a holistic view around these preferences is missing and crucial insights on why certain characteristics should be incorporated in a specific manner are absent. With this objective, we provide a categorization around these characteristics relevant to the task of text summarization: one, focusing on what content needs to be generated and second, focusing on the stylistic aspects of the output summaries. We use our insights to provide guidelines on appropriate methods to incorporate various classes characteristics in sequence-to-sequence summarization framework. Our experiments with incorporating topics, readability and simplicity indicate the viability of the proposed prescriptions.
Log-Structured Merge (LSM) based key-value stores have become so popular today that they are used as backend for NewSQL database abstractions like TiDB. They use indexes for faster data lookup whose memory overhead increases with database size leaving lesser memory for caching data blocks. We employ learned techniques to reduce this space overhead without compromising read latencies. We also explore learning approaches to prioritize compactions and devise new compaction policies
Block-based storage media like SSDs read in granularity of data blocks. Traditional indexes store the last key of each such block and perform binary search on these entries to fetch corresponding data block for lookup
Employed Learned Indexes to train a model to learn offsets from last keys of data blocks inside SSTables when created during compactions to reduce lookup time from O(logn) to O(1)
Obtained a 53% reduction in indexing memory footprint over traditional indexes with <5% increase in point lookup latency using both Fuzzy and Greedy Piecewise Linear Regression in RocksDB
Range query latencies show 11.4% reduction by picking SSTables with most user reads at a level for compaction thus minimizing seeks per read over default policy
On failures, routers typically have inconsistent state, which leads to high convergence times. In such cases, the central software controller could be a bottleneck and finding policy-compliant paths is hard. We propose for computation of such paths in the data plane with a central policy plane across end-host interfaces
Performed several experiments on search algorithms used to compute routes in the data plane using P4 stacks and recirculation using software emulation on mininet for the RocketFuel set of toplogies
Analysed performance relative to other latency-aware routing algorithms to establish stretch as a function of number of links
Provided support for Weighted Cost Multipath load balancing with dynamic weights on a per-packet basis
Added per-session, per-flow and per-packet consistency using register-caching of policies to avoid excessive recirculations
Typical traffic trace dumps from network simulators, or mathematical simulations of network topologies with variation in queuing, sending rates, etc can aid in online learning of weights in an effort to load-balance traffic across several outgoing links. We explored such approaches utilizing SmartNIC / P4 switch computations
Explored the use of RL-based actor-critic algorithms in designing traffic matrices to perform in-network load balancing to ensure max-min fairness and minimal queuing for Clos data-center networks
Implemented a compressed neural network version of the same on P4 switches with weighted multi-path load balancing
Harnessed SmartNIC-compute power to perform computation of weights at end-hosts to maintain line-rate processing at switches
The project aims at replicating locally stored states in the primary switch to the secondary switch in real-time to avoid loss of state information in case of failures. Locally stored states aid in packet processing at line rate
Constructed a synchronous cum asynchronouswrite-consistent bmv2 model to store "hard" network states (which can't be recovered from flow statistics) on the switch with consistent migration across backup switches in the data plane without control plane intervention
Achieved faster flow switchover compared to root controller-mediated state updates (where the controller stores and syncs such states across all switches)
Proposed an annotation-based API for a generalized fault-tolerant primitive to be incorporated in p4c
Typical Raman spectroscopy takes a very long acquisition time and is used for diagnosing critical diseases like cancer. The aim of this project is to reduce the acquisition time without compromising on quality
Learned a compact representation of paraffin subspace for spectral separation of biopsy sample using Nonnegative Sparse Coding, employing Blind Dictionary Learning with PCA for signal and noise separation
Performed inpainting to enable compressed sensing of Raman spectral images, to speedup image acquisition
Extended the same to the super-resolution use case with significant improvements over simple bicubic interpolation
Achieved better results with Gaussian Mixture Models trained on a smaller representative set using the Expectation-Maximization (EM) algorithm
VPP and Open vSwitch are currently the fastest DPDK-based software switches out there. The aim was to determine the minimal resources required for optimal performance of a switch for different use cases
Tested latency, throughput, efficiency in terms of cycles per packet with increasing cores, routing table entries and hierarchical cache sizes using uniform and skewed Gaussian traffic loads of 10 Gbps generated with DPDK-based packet generator MoonGen
Analyzed VPP's batch packet processing paradigm and tested batch size as a function of different parameters
Studied Cisco Express Forwarding implemented using multiway prefix trees in VPP, patented by Cisco
The task involved a study of different model-counting algorithms, which enumerate solutions to a boolean formula. The aim was to identify performance bottlenecks in the implemented model for optimization
Studied the SPARSE-COUNT algorithm and extended the same using GMP & MPFR libraries to support arbitrarily large number of variables and multi-precision computations
Implemented the above in ApproxMC framework which is a similar framework for model-counting, using θ(logn) low-density parity constraints with tolerance guarantees for results within a specified confidence interval
Results were validated using the IJCAI'16-CMV benchmarks
Data-driven optimizations for Log-structured Merge Trees Supervisor: Prof. Xiangyao Yu & Paris Koutris, University of Wisconsin-Madison
Modern database systems leverage key-value stores based on Log-Structured Merge (LSM) trees for storing metadata requiring fast lookups and updates. However, recent studies show that application throughput can be compromised by internal LSM tree operations that periodically write data to disk. The existing background work scheduler on Google's LevelDB was improvised for better application performance. Runtime parameters like memtable and SSTable sizes, triggers for compaction and stalling writes were auto-tuned using a Bayesian Optimizer, MLOS to adapt to various workloads. The foreground writes were decoupled from background memtable flushes and compactions and showed 2.24-2.34X improvement for industrial write bursty workloads in terms of observed client write throughput by scheduling these background operations during idle periods or when there were very few or no writes to the database
Centralized vs Decentralized Stochastic Optimization Algorithms Supervisor: Prof. Shivaram Venkataraman, University of Wisconsin-Madison
Federated learning entails training statistical models directly over numerous remote devices using local data leveraging their storage and computation capabilities. Security and data privacy concerns have pushed computation to the edge in contrast to classic ML training over centralized servers within datacenters. Centralized approaches like Parameter Server, Elastic Averaging SGD are compared with variants of Decentralized-PSGD. Biased and unbiased gradient compression operators like top-K, random-K, quantization via ECD-PSGD, DCD-PSGD and ChocoSGD are explored, with communication-computation overlap via Asynchronous D-PSGD to reduce idle time using bounded stale gradients. Training statistical efficiency is observed over time for bits transmitted across topologies like torus, ring with Stochastic Gradient Push (SGP) for directed graphs. Decentralized SGD algorithms with compressive communication are on par in convergence guarantees with their centralized counterparts
Count-Min Sketches for Network Traffic Scheduling Supervisor: Prof. Aditya Akella, University of Wisconsin-Madison
Recent active queue length management algorithms like RED, ECN, CoDel probe queue length to throttle sending rate across all senders. However, they do not aim to identify contributing flows as the root cause of queue build-up. Here, I explored Count-Min Sketches which overcome scalability issues of per-flow counters and dynamic allocation issues of hashmaps to accurately record per-flow queue occupancy. Distributed across snapshots comprising a fixed number of packets (alternately, of specified time intervals), each snapshot utilizes a count-min sketch based on register arrays in P4 to track flows in that interval, while cleverly reusing them after a certain total packet count. I tested my implementation on mininet with ECN-based feedback notifications to senders, thus utilizing Flow Completion time as a metric to demonstrate the effectiveness of this approach. I also used a C++ simulator with Pcapplusplus on UW Data Center Measurement Trace to evaluate Precision and Recall of the "contributing flow" classifier
Accelerating Image Segmentation with Parallel Computing Supervisor: Prof. Dan Negrut, University of Wisconsin-Madison
Image segmentation is used in medical imaging for tumour detection and edge detection for tracing blood capillaries and roadside kerbs for autonomous driving. Here, we try to accelerate the algorithms using CUDA, leveraging GPU and hybrid OpenMP+MPI approaches, leveraging multicores. We demonstrate optimizations like SIMD, loop unrolling, use of templates, forced inlining along with use of shared and unified memory, while exposing need for constructs like atomic/critical sections and thread synchronization/barriers through the implementation of Sobel & Canny edge-detectors and the Fuzzy C-Means algorithm for segmentation. Our work focuses on using CUDA streams, dynamic parallelism and thrust library along with OMP tasks to achieve high speedup
Tetrisbot Supervisor: Prof. Zick Yair, National University of Singapore
We designed a utility-based agent based on genetic algorithms, using a set of 10 state-dependent features like numer of holes, height differences between adjacent columns, max height of a column, etc. We used the single-point crossing over heuristic and implemented a multithreaded training approach random independent block sequences in parallel. Particle swarm optimization was also employed along with this for optimal convergence of weights to add an exploratory component. We achieved a maximum of over 856,000 cleared rows. Additionally, we implemented an auto-encoder approach with Q-learning for a low dimensional game state representation. Though not quite successful with Tetris, we demonstrated a simple game "Catch the Ball" with the above approach to demonstrate its effectiveness
Legal Case Retrieval System Supervisor: Prof. Zhao Jin, National University of Singapore
We designed a freetext search engine supporting both phrasal and boolean queries, leveraging NLTK to retrieve and rank legal case judgments. We finished 2nd out of 33 teams on the leaderboard based on the assignment given by the Singapore-based legal intelligence firm, Intellex. Positional indices were implemented to aid proximity search with additional zone and field indices like court hierarchy, legal case dates to aid in retrieval. We were able to get a high F1 score using various query expansion techniques like pseudo relevance feedback using the Rochhio algorithm, WordNet synonyms and co-occurence thesaurus generated from the corpus dictionary. We used the LNC model of tf-idf for freetext search
I implemented normalized graphcuts with α-expansion for image segmentation and denoising using multilabel 8-connected Markov Random Fields, and compared the same with mean-shift algorithm. I employed the PatchMatch algorithm to establish patch correspondences, for better alignment for homography. The other component involved obtaining dense correspondences from two images belonging to different viewpoints using manual methods and KLT tracker, to estimate the Fundamental Matrix using the 8-point algorithm
Generation of Nintendo Entertainment System Game layouts Supervisor: Prof. Ganesh Ramakrishnan, Indian Institute of Technology Bombay
We built a Deep Convolutional GAN model on pytorch for generating new game levels, i.e. tile sheets from previous game layouts. We used Leaky ReLU as the activation function for both the discriminator and generator with the Adam Optimizer for stochastic gradient descent
A Detailed Study and Comparison of General-Purpose Fuzzers Supervisor: Prof. Barton Miller, University of Wisconsin-Madison
We made a comparison of general-purpose mutation-based grey-box fuzzers like libFuzzer, American Fuzzy Lop (AFL) and honggfuzz and evaluated their performance on the Google fuzzer-test-suite across 24 applications on metrics like code coverage (basic blocks and edges) and bug-finding capabilities. We found a new unreported bug in pcre2-10.0 with the key finding that only libFuzzer can find memory leaks with the help of LeakSanitizer. Also proposed a new framework for ensemble fuzzing which uses different base fuzzers in tandem
Explored the Probabilistic Programming Monad in Figaro, which combines the object-oriented paradigm with the functional programming paradigm in Scala. Modeled real-life problems using Bayesian Networks with inference algorithms like Variable Elimination, Belief Propagation and Dynamic Reasoning algorithms like Factored Frontier. Simulated a simple market model using Decision Models to calculate the optimal policy. Extended the language by implementing a new Element class to model the distribution of the maximum value of a random variable, sampled from 0 to a given upper bound
We employed the Efros & Leung algorithm to synthesize larger textures, and used the same algorithm with a modified cost function for iterative texture transfer to target images using correspondence maps. We also implemented the minimal error boundary cut using dynamic programming to avoid block-seam artifacts
Systems: Adv & Intro to Operating Systems, Adv & Intro to Computer Networks, Adv & Intro to Databases, Big Data Systems, Machine Learning-Optimized Systems, High Performance Computing, Foundations of Data Management, Intro to Information Security, Computer Architecture, Implementation of Programming Languages
AI/ML: Computer Vision, Advanced & Digital Image Processing, Foundations of Machine Learning, Machine Learning (Graduate Level), Artificial Intelligence, Information Retrieval, Optimization
Statistics & Maths: Regression Analysis, Statistical Inference, Probability Theory, Derivative Pricing, Numerical Analysis, Data Analysis and Interpretation, Linear Algebra, Differential Equations, Calculus
Theory: Cryptography, Automata Theory, Logic in Computer Science, Discrete Structures, Design and Analysis of Algorithms, Data Structures & Algorithms, Abstractions and Paradigms for Programming
Others: Computer Graphics, Software Systems, Digital Logic Design, Computer Programming and Utilization, Electrical and Electronic Circuits, Quantum Physics and its Applications, Economics
Designed an abstract syntax tree comprising unary and binary operations, conditionals, functions, recursive functions, applications and let constructs on various data types for the compiler, utilizing the Gram parser for parsing instructions. Implemented a virtual machine instruction interpreter with an operand stack for control flow. Performed type-checking using Hindley Milner type inference with support for optional data types. The compiler was further optimized to leverage tail recursion and contiguous stack frames
OpenGL based 3D Animation Film Supervisor: Prof. Cheng Ho-lun Alan, National University of Singapore
Dynamic rendering techniques were used to create this animation film, based on OpenGL's various timer functions. Different camera transformations were used like dolly zoom to add artistic effects. I used motion simulation along Bezier curves, adding soft shadows and transparency effects using Ray tracing. For object modeling, Phong illumination and Phong shading were used with texture mapping and bump mapping to mimic real-life surfaces
PokeDB : A Pokemon RPG Game Supervisor: Prof. S Sudarshan, Indian Institute of Technology, Bombay
We built a multiplayer Pokemon game on PostgreSQL backend with JDBC API from pokeAPI JSON data with over 14,000 tuples. Online gym battles, navigable maps with probability models for capturing wild pokemon and evolution of pokemon with battle experience were also added
Feed'er : An All-purpose Academic App Supervisor: Prof. Sharat Chandran, Indian Institute of Technology, Bombay
Developed an integrated Android and Django based web app for displaying submission deadlines, exam dates and other important reminders via push-notifications. Implemented automatic sync and signup with social logins, with security measures against XSS, CSRF etc
Ethernet-enabled ATM Controller Supervisor: Prof. Supratik Chakraborty, Indian Institute of Technology, Bombay
Developed an Ethernet-enabled FPGA module on VHDL to dispense cash leveraging greedy algorithm in Xilinx ISE, with Tiny Encryption algorithm to provide secure exchange of user data. Enforced insufficient balance, incorrect pin using LED displays, and frontend caching to protect against server crashes
Ldap Authenticated Chat Application Supervisor: Prof. Varsha Apte, Indian Institute of Technology, Bombay
A server-client model with X11 based GUI was developed using Socket programming, with LDAP Authentication using openLDAP. Additional functionality for group chats, offline inbox via hashmaps and multimedia message exchanges were also supported
Movie Recommendation Engine Supervisor: Prof. Sharat Chandran, Indian Institute of Technology, Bombay
Designed a python program for generating correlation between the user and critic rating based on Euclidean distances. Using the critic ratings, generated a list of recommended movies for the user sorted according to ratings weighted by Pearson correlation coefficient calculated using similarity between user and critic's rating
Simulation of Rube Goldberg Model Supervisor: Prof. Sharat Chandran, Indian Institute of Technology, Bombay
Designed and simulated a Rube Goldberg Machine using Box2D, a physics simulation engine in C++, which involves compilation and linking to libraries like GLUI (GLUT based C++ user interface library). Designed a Star Wars arena by rendering attraction, repulsion among magnetic objects
Sudoku GamePlay Software Supervisor: Prof. Amitabha Sanyal, Indian Institute of Technology, Bombay
Built a GUI based solver on MIT Scheme with features like Undo, Auto-solve, and filters for seeding games of varying difficulty levels. Employed backtracking algorithm to solve any given initial configuration
Text Processor Supervisor: Prof. Varsha Apte, Indian Institute of Technology, Bombay
Built a class for enumeration of characters, words with support for Find and Replace using Knuth Morris Pratt algorithm, regular expressions, LZW compression and encryption and decryption via Caesar cipher
Body Fat Estimation Supervisor: Prof. Chan Yiu Man, National University of Singapore
Estimated body fat mass using stepwise regression with statistical tests to check for multicollinearity, lack of fit, outliers and influential points derived from cook's distance, dffits, dfbetas, studentised residuals implemented in R. The same was validated with partial F-test for the significance of model and Durbin-Watson test with Kolmogorov-Smirnov test for testing the independence and normality of residuals