I am currently a Data Scientist at C3.AI. As a Data Scientist, I collaborate with industry experts (internal and external) to find and deploy scalable Machine Learning and Artificial Intelligence solutions for digital transformation of our customers. Prior to this I completed my Ph.D. from the Department of Electrical and Systems Engineering at Washington University in St. Louis, Missouri, USA. My research focused on deriving the dynamical system representation of sensory systems, thereby engaging neuroscience and engineering in a closed loop. I completed my undergraduate from Jadavpur University, India majoring in Electrical Engineering.
Electrical & Systems Engineering, Washington University in St. Louis, MO, USA
August 2016 - August 2021
GPA - 3.93/4.0
Electrical & Systems Engineering, Washington University in St. Louis, MO, USA
August 2016 - December 2018
GPA - 3.89/4.0
Electrical Engineering
Jadavpur University, Kolkata, India
August 2011 - June 2015
GPA - 4.0/4.0
C3 AI, Redwood City, USA
Sept 2021 - present
Responsible for - 1. Designing and deploying Machine Learning algorithms for industrial applications to enable our customers' digital transformation. 2. Driving adoption of Deep Learning systems into next-generation of C3 AI products.
Fluor Daniel India Pvt. Ltd., Gurgaon, India
July 2015 - May 2016
Responsible for designing electrical solutions for industrial construction projects in the oil and gas sector.
Optimization, Machine Learning and Artificial Intelligence (including Deep Learning), Computational and Systems Neuroscience
Python, MATLAB, C/CPP, R
pandas, statsmodel, Scikit-learn, TensorFlow, etc.
git, SVN
Introduction to Artificial Intelligence, Introduction to Machine Learning, Optimization, Detection and Estimation, Probability and Stochastic Processes, Bayesian Machine Learning, Biological Neural Computation, Linear & Nonlinear Dynamical Systems
Oxford Machine Learning Summer School 2021 (Virtual)
Fall 2020, Spring 2021
Fall 2018, Fall 2019 : Introduction to Electrical & Systems Engineering (ESE 105)
Spring 2018 : Nonlinear Dynamical Systems (ESE 559)
One of the persistent challenges in contemporary neuroscience research involves understanding how neurons, through their activity and interactions, perform complex computations. A central question in this regard is: how do we form representations about the world around us? It is an important question not only because it allows us to better gauge how the brain functions, but also because it allows us to develop new, efficient computational algorithms. In this context, the central arc of my doctoral research is the development of modeling paradigms embedded in optimization theory, in order to investigate neural population dynamics and information coding in the brain and to construct new engineering solution approaches via algorithm design.
My research comprises of two parts:
In the first part, we begin by drawing on formulations from optimal control theory to understand what is the functional relevance of observed neural activity patterns in specific brain regions. We have found that sensory responses are designed to minimize unnecessary and wasteful activation. It turns out that the theoretical model predictions agree with observations in actual experiments, which we are able to substantiate through experimental collaborations working with two model organisms of differing complexity (i.e., locusts and C. elegans).
In the second part, we leveraged our learnings from biological networks towards design of network based control laws that has engineering significance. In this part of the study we investigated how networks should behave when they need to solve engineering problems with incomplete and mathematically complex information about the world.
Summary: Machine Learning and Deep Learning for Medical Diagnosis
I participated the WiDS Datathon 2021 on Kaggle (Jan - Mar 2021). The challenge here was to develop a predictive algorithm that takes as input patient demographic information, patient vitals, existence of comorbidity factors and lab examinations within the first 24 hours of admission and produces as output whether or not the patient is diabetic. This kind of predictive pipeline can drastically improve patient outcome in the hospital. The machine learning model we proposed for this use case comprised of a weighted ensemble of Gradient Boosting Model, Random Forest and a Deep Neural Network. The submission was ranked in the top 23% globally on the competition leaderboard.
Python Packages used (Scikit-Learn, XGBoost, CatBoost, TensorFlow, Keras)
Summary: Deep Learning for Pharmaceuticals
I participated in this Kaggle Competition (Oct - Nov 2020). The challenge was to develop an algorithm to predict the Mechanism of Action of a drug compound given it's cellular and genetic signature. The dataset for this project was collected in collaboration between Laboratory for Innovation Science at Harvard (LISH), and the NIH Common Funds Library of Integrated Network-Based Cellular Signatures (LINCS) and comprised of a training set of 23k+ examples with features such as genetic expression and cell viability in addition to information pertaining to treatment plan (dosage, duration etc.). The machine learning model I proposed was a feed-forward deep neural network. The final model reports a cross-entropy loss of 0.01678 (compared to the best submission 0.01599).
Python Packages used (Scikit-Learn, TensorFlow, Keras)
Summary: Computer Vision
I participated in this introductory Kaggle competition during summer of 2020. The task was to develop a model for classifying floral images (~16.5k training samples and 100+ floral classes). One of the key challenge for this dataset was that there was a great deal of structural similarity(for eg. color, shape) in the training images provided. The final architecture comprised of weighted ensemble of pretrained deep neural networks such as ResNet50, DenseNet201 and Xception. With this, after only 15 epochs of training the final fully connected layers, the model reported an improved accuracy of 93%.
Python packages used (Scikit-Learn, TensorFlow, Keras)
We analyzed fMRI data obtained from 133 subjects (122 children, 33 adults) while they were watching a short, animated movie with no verbal dialogues using Bayesian methods. The aim of this project was to identify if individual differences in processing cognitive stimuli can be clustered into age groups. Through Gibb's sampling, we started observing specific patterns emerging in the brain for each candidate.
Language: MATLAB
We implemented Kernel-based Soft Margin Support Vector Machine to distinguish between healthy and epileptic EEG signals. For this project, we used the open source Epileptic Seizure Recognition dataset made available by University of Bonn and preprocessed by University of California, Irvine. We looked at 1s long (178 data-points/features) EEG traces recorded from different brain regions in 5 subjects.
Language: MATLAB
In this project, we implemented an algorithm for face recognition (please note that this research was conducted before availability of datasets and resources for computer vision). Face recognition is a two step process: face localization and face verification. To perform this task, we first created a custom database that contained a set of images for each subject accounting for effects of scaling and rotation. Our algorithm worked by iteratively computing scaled and rotated versions of the template image and finding the region with maximum cross correlation coefficient with the modified template. To improve algorithm performance (i.e., reduce Type I and Type II errors) produce by the baseline method, we extracted features of input and template images using pre-existing algorithms and parsed the features through the pipeline.
Tools: MATLAB
Short-term sensory memory mediates paradoxical neural-behavioral transformation in C. elegans, CRCNS PI Meeting 2021 Find poster video here!
Elucidating and Leveraging Dynamics-Function Relationships in Neural Circuits through Modeling and Optimal Control Read full text here
Episodically optimized dynamical networks for robust motor control, ICML WiML Workshop 2021 Find poster here
Top-down modeling of distributed neural dynamics for motion control, American Control Conference 2021 Read full text here
Optimal tracking as a framework for normative synthesis of sensory networks, Bernstein Conference 2020 Find poster here
Neural Circuit Dynamics for Sensory Detection, Journal of Neuroscience 2020 Read full text here
A two timescale normative model of C. elegans sensory adaptation and behavior, CoSyne 2020
Normative modeling of sensory network dynamics for stimulus tracking, Neuroscience 2019
A two-timescale normative model for sensory tracking and adaptation, CRCNS PI Meeting 2019
Optimizing time-limited waveforms for non-invasive, focal neural stimulation, SIAM Annual Review 2018
Face localization by closed loop discriminator estimation and improved detection using contemporary feature extraction techniques, IEEE Conference on Computer Graphics, Vision and Information Security 2015 Read full text here
Instructed a graduate class!
Received my diploma for Masters!
Presented my work at Neuroscience 2019!
Presented my work at SIAM Annual Review 2018!
Class of 2015!
Organizing committee of Annual Tech Summit
Outside of work, I enjoy painting, baking and traveling.
Waves
Synchrony
Northern Lights
Sunset