Professional Experience

Technical Staff Member

Jan 2010 - Aug 2011

IBM Research, Watson

Architected and implemented Nimble framework for massive data analytics on Hadoop clusters. Led development of machine learning abstractions and parallel computing infrastructure.

Lead Data Architect

Jan 2009 - Dec 2009

IBM Research, Bangalore

Designed and built CRM-Analytics Framework for banking industry, focusing on high-performance data pipeline architecture and real-time analytics processing systems.

Senior Software Engineer

Oct 2003 - Dec 2008

IBM Software Labs, Bangalore

Lead developer for IBM Communications Server - enterprise gateway product. Architected API frameworks, implemented SNA protocol stacks, and optimized data link layer performance.

Technical Lead

Dec 2002 - Oct 2003

IBM Software Labs, Raleigh, NC

Established distributed development processes between US and India teams. Led architecture design and implementation of communications protocols.

Software Engineer

Aug 2000 - Dec 2002

IBM Software Labs, Bangalore

Developed core components of IBM Personal Communications terminal emulation software. Implemented data link protocols and real-time tracing systems.

Research Products

Exalearn

Exascale Machine Learning Technologies - CUDA-accelerated distributed training frameworks

PLANC

Parallel Low Rank Approximation with Non-negativity Constraints - GPU-optimized sparse and dense matrix operations

DSNAPSHOT

Distributed GPU-Accelerated Graph Analytics - First exascale Graph AI demonstration, Gordon Bell Award finalist. Leverages Nvidia CUTLASS for Semiring GEMM.

LORACX

Low Rank Approximation with Constraints at Exascale - PyTorch with CUDA and NCCL and MPI GPU Direct backend

HyperNeuro

High Performance Neuromorphic Simulator - GPU-accelerated High Performance Neuromorphic Simulator based on cuSparse, tensor cores and NCCL

Technical Skills

GPU Computing

CUDA cuML cuGraph RAPIDS Ecosystem

Expert level proficiency in GPU-accelerated computing, sparse data science and numerical computing

AI/ML Frameworks

PyTorch JAX Scikit-learn XGBoost Transformers

Production systems for production-scale model development

Programming Languages

Python C/C++ CUDA C++

High-performance system programming, scientific and numerical computing

HPC & Parallel Computing

MPI OpenMP NCCL Torch Distributed Dask

Distributed computing at exascale

Sparse AI/ML & Advanced Analytics

Graph Neural Networks Large Language Models Sparse Matrix Knowledge-Guided ML Tensor Factorization

Advanced analytics for graph and language models

Data Platforms

Pandas MPIIO/ADIOS MLFlow HDF5 MongoDB PostgreSQL

Large-scale data processing and analytics

rss facebook twitter github gitlab scholar scholar youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora patent