Sruti Mallik - Personal Website

About Me

Education

Ph.D.

Electrical & Systems Engineering, Washington University in St. Louis, MO, USA

August 2016 - August 2021

GPA - 3.93/4.0

M.S.

Electrical & Systems Engineering, Washington University in St. Louis, MO, USA

August 2016 - December 2018

GPA - 3.89/4.0

B.E.

Electrical Engineering

Jadavpur University, Kolkata, India

August 2011 - June 2015

GPA - 4.0/4.0

Professional Experience

Data Scientist

C3 AI, Redwood City, USA

Sept 2021 - present

Responsible for - 1. Designing and deploying Machine Learning algorithms for industrial applications to enable our customers' digital transformation. 2. Driving adoption of Deep Learning systems into next-generation of C3 AI products.

Associate Design Engineer

Fluor Daniel India Pvt. Ltd., Gurgaon, India

July 2015 - May 2016

Responsible for designing electrical solutions for industrial construction projects in the oil and gas sector.

Technical Expertise

Domain knowledge

Optimization, Machine Learning and Artificial Intelligence (including Deep Learning), Computational and Systems Neuroscience

Programming Languages

Python, MATLAB, C/CPP, R

Machine Learning and Deep Learning Packages

pandas, statsmodel, Scikit-learn, TensorFlow, etc.

Distributed development

git, SVN

Relevant Courses

Introduction to Artificial Intelligence, Introduction to Machine Learning, Optimization, Detection and Estimation, Probability and Stochastic Processes, Bayesian Machine Learning, Biological Neural Computation, Linear & Nonlinear Dynamical Systems

Certifications

Oxford Machine Learning Summer School 2021 (Virtual)

People Leadership Experience

Student Representative in Committee for Diversity, Equity and Inclusion

Fall 2020, Spring 2021

Teaching Assistant

Fall 2018, Fall 2019 : Introduction to Electrical & Systems Engineering (ESE 105)

Spring 2018 : Nonlinear Dynamical Systems (ESE 559)

Research & Projects

Thesis Summary

Elucidating and leveraging dynamics-function relationships in neural circuits through modeling and optimal control

One of the persistent challenges in contemporary neuroscience research involves understanding how neurons, through their activity and interactions, perform complex computations. A central question in this regard is: how do we form representations about the world around us? It is an important question not only because it allows us to better gauge how the brain functions, but also because it allows us to develop new, efficient computational algorithms. In this context, the central arc of my doctoral research is the development of modeling paradigms embedded in optimization theory, in order to investigate neural population dynamics and information coding in the brain and to construct new engineering solution approaches via algorithm design.

My research comprises of two parts:

1. Formulation of an optimization based framework to study brain dynamics and functions.

In the first part, we begin by drawing on formulations from optimal control theory to understand what is the functional relevance of observed neural activity patterns in specific brain regions. We have found that sensory responses are designed to minimize unnecessary and wasteful activation. It turns out that the theoretical model predictions agree with observations in actual experiments, which we are able to substantiate through experimental collaborations working with two model organisms of differing complexity (i.e., locusts and C. elegans).

2. Formulation of neuroscience inspired models for engineering problems.

In the second part, we leveraged our learnings from biological networks towards design of network based control laws that has engineering significance. In this part of the study we investigated how networks should behave when they need to solve engineering problems with incomplete and mathematically complex information about the world.

Highlights

Feature in University newsletter

Publication 1

Publication 2

Personal Projects

WiDS Datathon 2021

Find my code here!

Summary: Machine Learning and Deep Learning for Medical Diagnosis

I participated the WiDS Datathon 2021 on Kaggle (Jan - Mar 2021). The challenge here was to develop a predictive algorithm that takes as input patient demographic information, patient vitals, existence of comorbidity factors and lab examinations within the first 24 hours of admission and produces as output whether or not the patient is diabetic. This kind of predictive pipeline can drastically improve patient outcome in the hospital. The machine learning model we proposed for this use case comprised of a weighted ensemble of Gradient Boosting Model, Random Forest and a Deep Neural Network. The submission was ranked in the top 23% globally on the competition leaderboard.

Python Packages used (Scikit-Learn, XGBoost, CatBoost, TensorFlow, Keras)

Mechanism of Action of Drugs (Kaggle)

Find my project notebook here!

Summary: Deep Learning for Pharmaceuticals

I participated in this Kaggle Competition (Oct - Nov 2020). The challenge was to develop an algorithm to predict the Mechanism of Action of a drug compound given it's cellular and genetic signature. The dataset for this project was collected in collaboration between Laboratory for Innovation Science at Harvard (LISH), and the NIH Common Funds Library of Integrated Network-Based Cellular Signatures (LINCS) and comprised of a training set of 23k+ examples with features such as genetic expression and cell viability in addition to information pertaining to treatment plan (dosage, duration etc.). The machine learning model I proposed was a feed-forward deep neural network. The final model reports a cross-entropy loss of 0.01678 (compared to the best submission 0.01599).

Python Packages used (Scikit-Learn, TensorFlow, Keras)

Classification of a dataset of floral images (Kaggle)

Find my project notebook here! Find my ResNet implementation here!

Summary: Computer Vision

I participated in this introductory Kaggle competition during summer of 2020. The task was to develop a model for classifying floral images (~16.5k training samples and 100+ floral classes). One of the key challenge for this dataset was that there was a great deal of structural similarity(for eg. color, shape) in the training images provided. The final architecture comprised of weighted ensemble of pretrained deep neural networks such as ResNet50, DenseNet201 and Xception. With this, after only 15 epochs of training the final fully connected layers, the model reported an improved accuracy of 93%.

Python packages used (Scikit-Learn, TensorFlow, Keras)

Course Projects

Find dataset here

Functional Bayesian Analysis of fMRI data

We analyzed fMRI data obtained from 133 subjects (122 children, 33 adults) while they were watching a short, animated movie with no verbal dialogues using Bayesian methods. The aim of this project was to identify if individual differences in processing cognitive stimuli can be clustered into age groups. Through Gibb's sampling, we started observing specific patterns emerging in the brain for each candidate.

Language: MATLAB

Find dataset here

Epileptic Seizure Recognition

We implemented Kernel-based Soft Margin Support Vector Machine to distinguish between healthy and epileptic EEG signals. For this project, we used the open source Epileptic Seizure Recognition dataset made available by University of Bonn and preprocessed by University of California, Irvine. We looked at 1s long (178 data-points/features) EEG traces recorded from different brain regions in 5 subjects.

Language: MATLAB

Undergraduate Research

Publication

Cross Correlation based Facial Recognition

In this project, we implemented an algorithm for face recognition (please note that this research was conducted before availability of datasets and resources for computer vision). Face recognition is a two step process: face localization and face verification. To perform this task, we first created a custom database that contained a set of images for each subject accounting for effects of scaling and rotation. Our algorithm worked by iteratively computing scaled and rotated versions of the template image and finding the region with maximum cross correlation coefficient with the modified template. To improve algorithm performance (i.e., reduce Type I and Type II errors) produce by the baseline method, we extracted features of input and template images using pre-existing algorithms and parsed the features through the pipeline.

Tools: MATLAB

Publications

Short-term sensory memory mediates paradoxical neural-behavioral transformation in C. elegans, CRCNS PI Meeting 2021 Find poster video here!
Elucidating and Leveraging Dynamics-Function Relationships in Neural Circuits through Modeling and Optimal Control Read full text here
Episodically optimized dynamical networks for robust motor control, ICML WiML Workshop 2021 Find poster here
Top-down modeling of distributed neural dynamics for motion control, American Control Conference 2021 Read full text here
Optimal tracking as a framework for normative synthesis of sensory networks, Bernstein Conference 2020 Find poster here
Neural Circuit Dynamics for Sensory Detection, Journal of Neuroscience 2020 Read full text here
A two timescale normative model of C. elegans sensory adaptation and behavior, CoSyne 2020
Normative modeling of sensory network dynamics for stimulus tracking, Neuroscience 2019
A two-timescale normative model for sensory tracking and adaptation, CRCNS PI Meeting 2019
Optimizing time-limited waveforms for non-invasive, focal neural stimulation, SIAM Annual Review 2018
Face localization by closed loop discriminator estimation and improved detection using contemporary feature extraction techniques, IEEE Conference on Computer Graphics, Vision and Information Security 2015 Read full text here