About me
I am currently a fourth year Ph.D. student, working in the CARAML Lab at the University of Texas at Dallas. My Ph.D. advisor is Dr. Rishabh Iyer. Before this, I worked as a Data Scientist at Mercedes Benz Research and Development, India, from January 2018 - November 2020.
My research centers on developing techniques and algorithms that enable data-efficient, compute-efficient, and robust machine learning systems. I am interested in designing techniques that use the underlying data structure and analyze data samples’ importance for model learning to achieve the earlier goals. My current work on data subset selection focuses on selecting small data subsets on which the machine learning models can be trained with negligible loss in accuracy on unseen datasets(generalization) while achieving 5X - 10X speedups/ energy savings/ CO2 emission savings. Another critical aspect of the data subset selection methods is that we can efficiently use them for training the models robustly when the dataset has noisy labels and class imbalance. I am also interested in designing techniques for label-efficient machine learning through the lens of active learning, semi-supervised learning, data programming, and labeling functions paradigms. My research directly applies to building machine learning systems that can efficiently learn from a prohibitively massive amount of data in a scalable and robust manner and brings us one step closer to achieving Green AI [1].
For more details on my research, please check my research page. For details on my publications, please check my publications page.
News
-
Our paper “Orient: Submodular Mutual Information Measures for Data Subset Selection under Distribution Shift” got accepted at NeurIPS 2022!”
-
Our paper “AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning” got accepted at NeurIPS 2022!”
-
Our paper “How Out of Distribution Data Hurts Semi-Supervised Learning” got accepted at ICDM 2022!
-
Our paper “GCR: Gradient Coreset based Replay Buffer Selection for Continual Learning” got accepted at CVPR 2022!
-
Our paper “Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming” got accepted at Findings of ACL 2022!
-
Our paper “A Nested Bi-level Optimization Framework for Robust Few Shot Learning” got accepted at AAAI 2022!
-
Served as Programme Committee member for AAAI 2022
-
Our paper “RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning” got accepted at NeurIPS 2021!
-
Our paper “SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios” got accepted at NeurIPS 2021!
-
Served as Programme Committee member for NeurIPS 2021
-
Excited to release CORDS (Github), a PyTorch-based open-source efficient deep model training and autoML library!
-
Our paper “GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training” got accepted at ICML 2021!
-
Our paper “Semi-Supervised Data Programming with Subset Selection” got accepted at Findings of ACL 2021!
-
Served as Programme Committe member for AISTATS 2021
-
Our paper “GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning” got accepted to AAAI 2021!
-
Served as Programme Committe member for AAAI 2021
References
- [1]R. Schwartz, J. Dodge, N. Smith, and O. Etzioni, “Green AI,” Communications of the ACM, vol. 63, pp. 54–63, 2020.