Arijit Pramanik

I graduated with an MS in Computer Sciences from University of Wisconsin-Madison (UW-Madison) in May 2021, where I worked on Storage Systems (Key-Value Stores) and Machine Learning for Systems (Learned Indexes) under Prof. Remzi and Andrea Arpaci-Dusseau.

Prior to that, I spent four amazing years (2015-19) at Indian Institute of Technology Bombay (IIT Bombay) with a B.Tech (Hons) in Computer Science and Engineering (CSE) and a minor in Applied Statistics and Informatics. I was a recipient of the Institute Academic Excellence Award for securing Dept. Rank 1 in 2017-18. I also spent a wonderful semester on exchange at the National University of Singapore (NUS) in the School of Computing.

I'm also a national level swimmer and won 4 gold, 5 silver and 11 bronze medals across 4 years at the Inter-IIT Aquatics Meet for which I was awarded the Institute Sports Color in 2016 and Hostel Color in 2017. I led my team in 2018-19 and was awarded the Institute Sports Citation in 2019 for being the best outgoing swimmer. I'm an NTSE 2013 and KVPY 2013 scholar.

Email / Resume / Github / LinkedIn / Other links

Updates

[Feb 2022] Started at Google (Core Data Infrastructure), Sunnyvale, CA as SWE
[Jun 2021] Started at Amazon Web Services (Relational Database Service), E. Palo Alto, CA as SDE
[Jul 2020] Accepted Research Assistantship under Prof. Remzi and Andrea Arpaci-Dusseau for Aug'20-May'21
[Feb 2020] Incoming SDE Intern at Amazon Web Services (RDS Open Source), E. Palo Alto, CA for Summer'20
[Jan 2020] Continuing with Teaching Assistantship at UW Madison for CS 559 (Computer Graphics) in Spring'20
[Aug 2019] Received Teaching Assistantship at UW Madison for CS 559 (Computer Graphics) in Fall'19
[Feb 2019] Accepted admit offer from University of Wisconsin-Madison with special CS scholarship

Key Internships
	AWS Relational Database Service, Palo Alto, CA [May'20-August'20] Manager: Jignesh Shah, Engineering Leader - Amazon RDS for PostgreSQL Sandboxing of untrusted language procedures within RDS PostgreSQL The goal was to facilitate sandboxed execution of untrusted functions, stored procedures and subtransactions inside Docker & LXC containers. This was to prevent customer programs from crashing the main database server instance and enforce finer access privileges while maintaining average transaction latency at par with equivalent extensions running locally on PostgreSQL Integrated an open-source extension PL/Container into PostgreSQL 12 & 13 using shared memory segments, grpc channels & Unix sockets for communication between customer and container backend Performed extensive benchmarking with different container configurations to identify bottlenecks like usage of separate Docker containers for every customer session leading to frequent spawning of containers Developed a new prototype leveraging a single container across all customer sessions, leading to a 63% reduction in memory usage while scaling to thousands of customer connections Added runtime support for Go language, and created 3 separate extensions for each of R, Python and Go utilizing separate containers for better isolation Received a *Full Time Offer* in recognition of the above work
	University of Washington, Seattle, WA [May'19-August'19] Supervisor: Prof. Arvind Krishnamurthy Hardware Acceleration of Proxies The aim was to decrease the latency incurred at the level of proxies running as host-based processes Worked on Layer 4 and Layer 7 load balancing of different proxies like Envoy, Nginx & HAproxy to demarcate functionalities for host and SmartNIC offloading Performed benchmarking experiments using wrk2 to determine feasibility of SSL certificate verification offloading and scalability, with a detailed study of Envoy worker threads
	Adobe Systems (Research), Bengaluru [May'18-July'18] Supervisor: Dr. Balaji Srinivasan, Sr Research Scientist, BigData Experience Labs Characteristics-Tailored Summary Generation Unlike typical abstractive text summarisation, the aim was to tune summaries to characteristics like being more formal as required by news agencies or focus on certain financial aspects as desired by corporate organisations Adapted Facebook AI Research's Convolutional seq2seq model for translation to topic-tuned summary generation with modified attention weights to focus on specific input embeddings Altered beam search paradigm for tweaking decoder state probability distributions, thus enhancing word-level features like descriptiveness with token-based learning for length based summarisation Incorporated a Reinforcement Learning term in loss function and achieved a 6.4% increase in ROUGE scores Implemented the above insights on pointer-generator framework and submitted a patent application (P8322-US) for the same at United States Patent and Trademark Office Received a *Full Time Offer* in recognition of the above work
	Philips Innovation Campus, Bengaluru [May'17-July'17] Supervisor: Dr. Rajendra Sisodia, Principal Scientist, Philips Research Domain-specific Customer Care Chatbot Modern chatbots perform well in conversations comprising simple question-answer pairs. The aim was to develop a semantic control algorithm to track context switches to predict favourable next steps in the conversation Designed a chatbot leveraging word2vec, Latent Semantic Indexing and Latent Dirichlet Allocation for topics relevant to user query with tf-idf weighted word n-grams for improving accuracy Incorporated probabilistic finite automata to model conversation state changes guided by sentiment scores Built an emotion classifier SVM, and ontologies for knowledge representation from RDF sources with SPARQL queries for fetching data Brief Report
Publications
	Generating summaries tailored to target characteristics 20th International Conference on Computational Linguistics and Intelligent Text Processing (CiCLing 2019) Recently, research efforts have gained pace to cater to varied user preferences while generating text summaries. While there have been attempts to incorporate a few handpicked characteristics such as length or entities, a holistic view around these preferences is missing and crucial insights on why certain characteristics should be incorporated in a specific manner are absent. With this objective, we provide a categorization around these characteristics relevant to the task of text summarization: one, focusing on what content needs to be generated and second, focusing on the stylistic aspects of the output summaries. We use our insights to provide guidelines on appropriate methods to incorporate various classes characteristics in sequence-to-sequence summarization framework. Our experiments with incorporating topics, readability and simplicity indicate the viability of the proposed prescriptions. pdf version
Research Projects
	Learned strategies for Key-Value Stores Supervisor: Prof. Remzi & Andrea Arpaci-Dusseau, University of Wisconsin-Madison Log-Structured Merge (LSM) based key-value stores have become so popular today that they are used as backend for NewSQL database abstractions like TiDB. They use indexes for faster data lookup whose memory overhead increases with database size leaving lesser memory for caching data blocks. We employ learned techniques to reduce this space overhead without compromising read latencies. We also explore learning approaches to prioritize compactions and devise new compaction policies Block-based storage media like SSDs read in granularity of data blocks. Traditional indexes store the last key of each such block and perform binary search on these entries to fetch corresponding data block for lookup Employed Learned Indexes to train a model to learn offsets from last keys of data blocks inside SSTables when created during compactions to reduce lookup time from O(logn) to O(1) Obtained a 53% reduction in indexing memory footprint over traditional indexes with <5% increase in point lookup latency using both Fuzzy and Greedy Piecewise Linear Regression in RocksDB Range query latencies show 11.4% reduction by picking SSTables with most user reads at a level for compaction thus minimizing seeks per read over default policy Learned Compactions Slides \| Learned Indexes Slides \| Report
	Dataplane-Only Policy-Compliant Routing under Failures Supervisor: Prof. Aditya Akella, University of Wisconsin-Madison On failures, routers typically have inconsistent state, which leads to high convergence times. In such cases, the central software controller could be a bottleneck and finding policy-compliant paths is hard. We propose for computation of such paths in the data plane with a central policy plane across end-host interfaces Performed several experiments on search algorithms used to compute routes in the data plane using P4 stacks and recirculation using software emulation on mininet for the RocketFuel set of toplogies Analysed performance relative to other latency-aware routing algorithms to establish stretch as a function of number of links Provided support for Weighted Cost Multipath load balancing with dynamic weights on a per-packet basis Added per-session, per-flow and per-packet consistency using register-caching of policies to avoid excessive recirculations Slide Deck 1 \| Slide Deck 2 \| D2R preprint
	Adaptive SmartNIC-accelerated micro-load balancing Supervisor: Prof. Aditya Akella, University of Wisconsin-Madison Typical traffic trace dumps from network simulators, or mathematical simulations of network topologies with variation in queuing, sending rates, etc can aid in online learning of weights in an effort to load-balance traffic across several outgoing links. We explored such approaches utilizing SmartNIC / P4 switch computations Explored the use of RL-based actor-critic algorithms in designing traffic matrices to perform in-network load balancing to ensure max-min fairness and minimal queuing for Clos data-center networks Implemented a compressed neural network version of the same on P4 switches with weighted multi-path load balancing Harnessed SmartNIC-compute power to perform computation of weights at end-hosts to maintain line-rate processing at switches Slide Deck for P4 version
	State Replication and Fault Tolerance in P4 Supervisor: Prof. Mythili Vutukuru, Indian Institute of Technology Bombay The project aims at replicating locally stored states in the primary switch to the secondary switch in real-time to avoid loss of state information in case of failures. Locally stored states aid in packet processing at line rate Constructed a synchronous cum asynchronous write-consistent bmv2 model to store "hard" network states (which can't be recovered from flow statistics) on the switch with consistent migration across backup switches in the data plane without control plane intervention Achieved faster flow switchover compared to root controller-mediated state updates (where the controller stores and syncs such states across all switches) Proposed an annotation-based API for a generalized fault-tolerant primitive to be incorporated in p4c Report \| Presentation
	Imaging Techniques with Raman Spectroscopic Imaging Supervisor: Prof. Ajit Rajwade, Indian Institute of Technology Bombay Typical Raman spectroscopy takes a very long acquisition time and is used for diagnosing critical diseases like cancer. The aim of this project is to reduce the acquisition time without compromising on quality Learned a compact representation of paraffin subspace for spectral separation of biopsy sample using Nonnegative Sparse Coding, employing Blind Dictionary Learning with PCA for signal and noise separation Performed inpainting to enable compressed sensing of Raman spectral images, to speedup image acquisition Extended the same to the super-resolution use case with significant improvements over simple bicubic interpolation Achieved better results with Gaussian Mixture Models trained on a smaller representative set using the Expectation-Maximization (EM) algorithm Report \| Presentation \| ICIP 2019 Paper Draft
	Benchmarking of Software Switches Supervisor: Prof. Mythili Vutukuru, Indian Institute of Technology Bombay VPP and Open vSwitch are currently the fastest DPDK-based software switches out there. The aim was to determine the minimal resources required for optimal performance of a switch for different use cases Tested latency, throughput, efficiency in terms of cycles per packet with increasing cores, routing table entries and hierarchical cache sizes using uniform and skewed Gaussian traffic loads of 10 Gbps generated with DPDK-based packet generator MoonGen Analyzed VPP's batch packet processing paradigm and tested batch size as a function of different parameters Studied Cisco Express Forwarding implemented using multiway prefix trees in VPP, patented by Cisco Report \| Presentation
	Optimizing Performance of Model-Counting Algorithms Supervisor: Prof. Kuldeep Meel, National University of Singapore The task involved a study of different model-counting algorithms, which enumerate solutions to a boolean formula. The aim was to identify performance bottlenecks in the implemented model for optimization Studied the SPARSE-COUNT algorithm and extended the same using GMP & MPFR libraries to support arbitrarily large number of variables and multi-precision computations Implemented the above in ApproxMC framework which is a similar framework for model-counting, using θ(logn) low-density parity constraints with tolerance guarantees for results within a specified confidence interval Results were validated using the IJCAI'16-CMV benchmarks Report

Course Projects
	Data-driven optimizations for Log-structured Merge Trees Supervisor: Prof. Xiangyao Yu & Paris Koutris, University of Wisconsin-Madison Modern database systems leverage key-value stores based on Log-Structured Merge (LSM) trees for storing metadata requiring fast lookups and updates. However, recent studies show that application throughput can be compromised by internal LSM tree operations that periodically write data to disk. The existing background work scheduler on Google's LevelDB was improvised for better application performance. Runtime parameters like memtable and SSTable sizes, triggers for compaction and stalling writes were auto-tuned using a Bayesian Optimizer, MLOS to adapt to various workloads. The foreground writes were decoupled from background memtable flushes and compactions and showed 2.24-2.34X improvement for industrial write bursty workloads in terms of observed client write throughput by scheduling these background operations during idle periods or when there were very few or no writes to the database Project Report \| Slides
	Centralized vs Decentralized Stochastic Optimization Algorithms Supervisor: Prof. Shivaram Venkataraman, University of Wisconsin-Madison Federated learning entails training statistical models directly over numerous remote devices using local data leveraging their storage and computation capabilities. Security and data privacy concerns have pushed computation to the edge in contrast to classic ML training over centralized servers within datacenters. Centralized approaches like Parameter Server, Elastic Averaging SGD are compared with variants of Decentralized-PSGD. Biased and unbiased gradient compression operators like top-K, random-K, quantization via ECD-PSGD, DCD-PSGD and ChocoSGD are explored, with communication-computation overlap via Asynchronous D-PSGD to reduce idle time using bounded stale gradients. Training statistical efficiency is observed over time for bits transmitted across topologies like torus, ring with Stochastic Gradient Push (SGP) for directed graphs. Decentralized SGD algorithms with compressive communication are on par in convergence guarantees with their centralized counterparts Project Report \| Slides \| Github repo
	Count-Min Sketches for Network Traffic Scheduling Supervisor: Prof. Aditya Akella, University of Wisconsin-Madison Recent active queue length management algorithms like RED, ECN, CoDel probe queue length to throttle sending rate across all senders. However, they do not aim to identify contributing flows as the root cause of queue build-up. Here, I explored Count-Min Sketches which overcome scalability issues of per-flow counters and dynamic allocation issues of hashmaps to accurately record per-flow queue occupancy. Distributed across snapshots comprising a fixed number of packets (alternately, of specified time intervals), each snapshot utilizes a count-min sketch based on register arrays in P4 to track flows in that interval, while cleverly reusing them after a certain total packet count. I tested my implementation on mininet with ECN-based feedback notifications to senders, thus utilizing Flow Completion time as a metric to demonstrate the effectiveness of this approach. I also used a C++ simulator with Pcapplusplus on UW Data Center Measurement Trace to evaluate Precision and Recall of the "contributing flow" classifier Project Report \| Github repo
	Accelerating Image Segmentation with Parallel Computing Supervisor: Prof. Dan Negrut, University of Wisconsin-Madison Image segmentation is used in medical imaging for tumour detection and edge detection for tracing blood capillaries and roadside kerbs for autonomous driving. Here, we try to accelerate the algorithms using CUDA, leveraging GPU and hybrid OpenMP+MPI approaches, leveraging multicores. We demonstrate optimizations like SIMD, loop unrolling, use of templates, forced inlining along with use of shared and unified memory, while exposing need for constructs like atomic/critical sections and thread synchronization/barriers through the implementation of Sobel & Canny edge-detectors and the Fuzzy C-Means algorithm for segmentation. Our work focuses on using CUDA streams, dynamic parallelism and thrust library along with OMP tasks to achieve high speedup Project Report \| Github repo
	Tetrisbot Supervisor: Prof. Zick Yair, National University of Singapore We designed a utility-based agent based on genetic algorithms, using a set of 10 state-dependent features like numer of holes, height differences between adjacent columns, max height of a column, etc. We used the single-point crossing over heuristic and implemented a multithreaded training approach random independent block sequences in parallel. Particle swarm optimization was also employed along with this for optimal convergence of weights to add an exploratory component. We achieved a maximum of over 856,000 cleared rows. Additionally, we implemented an auto-encoder approach with Q-learning for a low dimensional game state representation. Though not quite successful with Tetris, we demonstrated a simple game "Catch the Ball" with the above approach to demonstrate its effectiveness Report \| Github repo \| A study of genetic algorithms
	Legal Case Retrieval System Supervisor: Prof. Zhao Jin, National University of Singapore We designed a freetext search engine supporting both phrasal and boolean queries, leveraging NLTK to retrieve and rank legal case judgments. We finished 2nd out of 33 teams on the leaderboard based on the assignment given by the Singapore-based legal intelligence firm, Intellex. Positional indices were implemented to aid proximity search with additional zone and field indices like court hierarchy, legal case dates to aid in retrieval. We were able to get a high F1 score using various query expansion techniques like pseudo relevance feedback using the Rochhio algorithm, WordNet synonyms and co-occurence thesaurus generated from the corpus dictionary. We used the LNC model of tf-idf for freetext search Github repo
	Stereo Image Reconstruction using Energy Minimization Supervisor: Prof. Cheong Loong Fah & Prof. Feng Jiashi, National University of Singapore I implemented normalized graphcuts with α-expansion for image segmentation and denoising using multilabel 8-connected Markov Random Fields, and compared the same with mean-shift algorithm. I employed the PatchMatch algorithm to establish patch correspondences, for better alignment for homography. The other component involved obtaining dense correspondences from two images belonging to different viewpoints using manual methods and KLT tracker, to estimate the Fundamental Matrix using the 8-point algorithm Component 1 Report \| Component 2 Report
	Generation of Nintendo Entertainment System Game layouts Supervisor: Prof. Ganesh Ramakrishnan, Indian Institute of Technology Bombay We built a Deep Convolutional GAN model on pytorch for generating new game levels, i.e. tile sheets from previous game layouts. We used Leaky ReLU as the activation function for both the discriminator and generator with the Adam Optimizer for stochastic gradient descent Brief Presentation
	A Detailed Study and Comparison of General-Purpose Fuzzers Supervisor: Prof. Barton Miller, University of Wisconsin-Madison We made a comparison of general-purpose mutation-based grey-box fuzzers like libFuzzer, American Fuzzy Lop (AFL) and honggfuzz and evaluated their performance on the Google fuzzer-test-suite across 24 applications on metrics like code coverage (basic blocks and edges) and bug-finding capabilities. We found a new unreported bug in pcre2-10.0 with the key finding that only libFuzzer can find memory leaks with the help of LeakSanitizer. Also proposed a new framework for ensemble fuzzing which uses different base fuzzers in tandem Project Report \| Poster
	Figaro : A Probabilistic Programming Language Supervisor: Prof. Razvan Voicu & Prof. Chin Wei Ngan, National University of Singapore Explored the Probabilistic Programming Monad in Figaro, which combines the object-oriented paradigm with the functional programming paradigm in Scala. Modeled real-life problems using Bayesian Networks with inference algorithms like Variable Elimination, Belief Propagation and Dynamic Reasoning algorithms like Factored Frontier. Simulated a simple market model using Decision Models to calculate the optimal policy. Extended the language by implementing a new Element class to model the distribution of the maximum value of a random variable, sampled from 0 to a given upper bound Report \| Presentation \| Github repo
	Image Quilting for Texture Synthesis and Transfer Supervisor: Prof. Suyash Awate & Prof. Ajit Rajwade, Indian Institute of Technology Bombay We employed the Efros & Leung algorithm to synthesize larger textures, and used the same algorithm with a modified cost function for iterative texture transfer to target images using correspondence maps. We also implemented the minimal error boundary cut using dynamic programming to avoid block-seam artifacts Report

Relevant Coursework

Systems: Adv & Intro to Operating Systems, Adv & Intro to Computer Networks, Adv & Intro to Databases, Big Data Systems, Machine Learning-Optimized Systems, High Performance Computing, Foundations of Data Management, Intro to Information Security, Computer Architecture, Implementation of Programming Languages

AI/ML: Computer Vision, Advanced & Digital Image Processing, Foundations of Machine Learning, Machine Learning (Graduate Level), Artificial Intelligence, Information Retrieval, Optimization

Statistics & Maths: Regression Analysis, Statistical Inference, Probability Theory, Derivative Pricing, Numerical Analysis, Data Analysis and Interpretation, Linear Algebra, Differential Equations, Calculus

Theory: Cryptography, Automata Theory, Logic in Computer Science, Discrete Structures, Design and Analysis of Algorithms, Data Structures & Algorithms, Abstractions and Paradigms for Programming

Others: Computer Graphics, Software Systems, Digital Logic Design, Computer Programming and Utilization, Electrical and Electronic Circuits, Quantum Physics and its Applications, Economics

Teaching Assistantships

CS 559 (Computer Graphics) at UW-Madison under Prof. Michael Gleicher [Spring '20]
CS 559 (Computer Graphics) at UW-Madison under Florian Heimerl [Fall '19]
CS 101 (Computer Programming and Utilization) at IIT Bombay under Prof. Ganesh Ramakrishnan [Spring '19]
CS 341 (Computer Architecture) at IIT Bombay under Prof. Bernard Menezes [Fall '18]
BB 101 (Introduction to Biology) at IIT Bombay under Prof. Ambarish Kunwar [Spring '17]

Other Internships

Focus Analytics, Mumbai [Nov'17-Dec'17]
Supervisor: Sudin Kadam, Head of Research

Contextual Marketing for Retail Analytics

Leveraged topic-modeling and word2vec similarity scores for customer segmentation and retail-affinity estimation using gensim and SpaCy
Implemented a probabilistic graphical model based recommendation engine, contributing to pgmpy github repo
Created a new query language with pyparsing for internal database system on neo4j, utilizing EBNF grammar rules

Brief Report

OliveSync, Zone Startups India, Mumbai [Dec'16]
Supervisor: Ketan Ghatode, CTO, OliveSync Pvt. Ltd.

Automated Timetable Generation

Designed a scheduling algorithm leveraging genetic algorithms to generate the best fit optimal timetable for institutions
Added live sync to MySQL database on PHP to track occurrence of classes, and course adjustments
Employed Gale-Shapley algorithm for alloting time slot priorities to students and professors

Brief Report

Other Projects

A Java-like Compiler for OCaml
Supervisor: Prof. Razvan Voicu & Prof. Chin Wei Ngan, National University of Singapore

Designed an abstract syntax tree comprising unary and binary operations, conditionals, functions, recursive functions, applications and let constructs on various data types for the compiler, utilizing the Gram parser for parsing instructions. Implemented a virtual machine instruction interpreter with an operand stack for control flow. Performed type-checking using Hindley Milner type inference with support for optional data types. The compiler was further optimized to leverage tail recursion and contiguous stack frames

OpenGL based 3D Animation Film
Supervisor: Prof. Cheng Ho-lun Alan, National University of Singapore

Dynamic rendering techniques were used to create this animation film, based on OpenGL's various timer functions. Different camera transformations were used like dolly zoom to add artistic effects. I used motion simulation along Bezier curves, adding soft shadows and transparency effects using Ray tracing. For object modeling, Phong illumination and Phong shading were used with texture mapping and bump mapping to mimic real-life surfaces

PokeDB : A Pokemon RPG Game
Supervisor: Prof. S Sudarshan, Indian Institute of Technology, Bombay

We built a multiplayer Pokemon game on PostgreSQL backend with JDBC API from pokeAPI JSON data with over 14,000 tuples. Online gym battles, navigable maps with probability models for capturing wild pokemon and evolution of pokemon with battle experience were also added

Report

Feed'er : An All-purpose Academic App
Supervisor: Prof. Sharat Chandran, Indian Institute of Technology, Bombay

Developed an integrated Android and Django based web app for displaying submission deadlines, exam dates and other important reminders via push-notifications. Implemented automatic sync and signup with social logins, with security measures against XSS, CSRF etc

User Manual | Presentation

Ethernet-enabled ATM Controller
Supervisor: Prof. Supratik Chakraborty, Indian Institute of Technology, Bombay

Developed an Ethernet-enabled FPGA module on VHDL to dispense cash leveraging greedy algorithm in Xilinx ISE, with Tiny Encryption algorithm to provide secure exchange of user data. Enforced insufficient balance, incorrect pin using LED displays, and frontend caching to protect against server crashes

Ldap Authenticated Chat Application
Supervisor: Prof. Varsha Apte, Indian Institute of Technology, Bombay

A server-client model with X11 based GUI was developed using Socket programming, with LDAP Authentication using openLDAP. Additional functionality for group chats, offline inbox via hashmaps and multimedia message exchanges were also supported

Slides

Movie Recommendation Engine
Supervisor: Prof. Sharat Chandran, Indian Institute of Technology, Bombay

Designed a python program for generating correlation between the user and critic rating based on Euclidean distances. Using the critic ratings, generated a list of recommended movies for the user sorted according to ratings weighted by Pearson correlation coefficient calculated using similarity between user and critic's rating

Simulation of Rube Goldberg Model
Supervisor: Prof. Sharat Chandran, Indian Institute of Technology, Bombay

Designed and simulated a Rube Goldberg Machine using Box2D, a physics simulation engine in C++, which involves compilation and linking to libraries like GLUI (GLUT based C++ user interface library). Designed a Star Wars arena by rendering attraction, repulsion among magnetic objects

Sudoku GamePlay Software
Supervisor: Prof. Amitabha Sanyal, Indian Institute of Technology, Bombay

Built a GUI based solver on MIT Scheme with features like Undo, Auto-solve, and filters for seeding games of varying difficulty levels. Employed backtracking algorithm to solve any given initial configuration

Report

Text Processor
Supervisor: Prof. Varsha Apte, Indian Institute of Technology, Bombay

Built a class for enumeration of characters, words with support for Find and Replace using Knuth Morris Pratt algorithm, regular expressions, LZW compression and encryption and decryption via Caesar cipher

Body Fat Estimation
Supervisor: Prof. Chan Yiu Man, National University of Singapore

Estimated body fat mass using stepwise regression with statistical tests to check for multicollinearity, lack of fit, outliers and influential points derived from cook's distance, dffits, dfbetas, studentised residuals implemented in R. The same was validated with partial F-test for the significance of model and Durbin-Watson test with Kolmogorov-Smirnov test for testing the independence and normality of residuals

Report

A Study of Statistical Tests and Sampling Algorithms
Supervisor: Prof. Radhendushka Srivastava, Indian Institute of Technology, Bombay

Performed a critical study and simulation of the Random Excursions test and the famous sampling algorithm, Metropolis Hastings Algorithm

Random Excursions Report | Metropolis Hastings Algorithm Report

Go to top

Website template courtesy