IBM Research, Watson
Architected and implemented Nimble framework for massive data analytics on Hadoop clusters. Led development of machine learning abstractions and parallel computing infrastructure.
IBM Research, Bangalore
Designed and built CRM-Analytics Framework for banking industry, focusing on high-performance data pipeline architecture and real-time analytics processing systems.
IBM Software Labs, Bangalore
Lead developer for IBM Communications Server - enterprise gateway product. Architected API frameworks, implemented SNA protocol stacks, and optimized data link layer performance.
IBM Software Labs, Raleigh, NC
Established distributed development processes between US and India teams. Led architecture design and implementation of communications protocols.
IBM Software Labs, Bangalore
Developed core components of IBM Personal Communications terminal emulation software. Implemented data link protocols and real-time tracing systems.
Exascale Machine Learning Technologies - CUDA-accelerated distributed training frameworks
Parallel Low Rank Approximation with Non-negativity Constraints - GPU-optimized sparse and dense matrix operations
Distributed GPU-Accelerated Graph Analytics - First exascale Graph AI demonstration, Gordon Bell Award finalist. Leverages Nvidia CUTLASS for Semiring GEMM.
Low Rank Approximation with Constraints at Exascale - PyTorch with CUDA and NCCL and MPI GPU Direct backend
High Performance Neuromorphic Simulator - GPU-accelerated High Performance Neuromorphic Simulator based on cuSparse, tensor cores and NCCL
Expert level proficiency in GPU-accelerated computing, sparse data science and numerical computing
Production systems for production-scale model development
High-performance system programming, scientific and numerical computing
Distributed computing at exascale
Advanced analytics for graph and language models
Large-scale data processing and analytics