View below or download a PDF copy.
Srividya Bandari
Software Engineer — Backend & Distributed Systems
Summary
Software Engineer with 4+ years of experience building distributed backend services, event-driven workflows, real-time streaming systems, and ML-assisted decision pipelines. Strong background in .NET Core, Java, Kafka, Pulsar, Flink, Redis, PostgreSQL, AWS/Azure, observability, and production reliability. Delivered measurable impact across high-volume systems, including 35% workflow latency reduction, 70% VIN decode p95 improvement, 45% faster issue detection, and stronger release safety.
Work Experience
Tech: .NET Core, Apache Pulsar, Redis, PostgreSQL, Terraform, Honeycomb
- ▸Owned core services in a distributed .NET Core inspection workflow platform processing 300K+ daily vehicle inspections and 3.5M+ validation, integration, audit, and observability events/day across inspection, inventory, VIN intelligence, and downstream systems.
- ▸Designed Apache Pulsar-based event propagation with replay-safe consumers, correlation IDs, schema-versioned payloads, and failure-isolated retry paths, reducing workflow latency by 35% while improving real-time state consistency.
- ▸Built ML-assisted inspection decision pipelines by enriching workflow events with VIN intelligence, vehicle metadata, inspection signals, and historical state-transition features, generating risk scores used by rule engines and manual-review queues to detect anomalous transitions, duplicate processing signals, and inconsistent vehicle attributes.
- ▸Implemented idempotent event consumers with deduplication keys, exponential backoff, dead-letter handling, and vendor-failure isolation, reducing duplicate processing risk and preventing invalid inspection state transitions during downstream degradation.
- ▸Optimized high-volume VIN decoding and reference-data lookups using Redis-backed caching, TTL-based invalidation, and request coalescing, improving p95 latency by 70% while reducing repeated third-party vendor calls.
- ▸Led production rollout with Terraform-managed feature flags, staged enablement, canary deployments, rollback paths, and Honeycomb distributed tracing, reducing MTTD by 45% and improving release safety across critical inspection workflows.
Tech: Python, PyTorch, TensorFlow, Multi-view CNNs, SageMaker, Datadog
- ▸Designed a modular multi-view CNN training and inference pipeline in PyTorch, fusing image embeddings with structured metadata features to support reproducible model development across medical-imaging datasets.
- ▸Improved model performance through class-imbalance handling, loss-function tuning, threshold calibration, and 1K+ hard-case error analysis, raising accuracy from 87% to 95% while reducing false-negative rate from 13% to 5%.
- ▸Added PyTorch hooks, tensor-level tracing, and Datadog dashboards for feature extraction, inference latency, and failure diagnostics, reducing MTTR from 2 days to 4 hours and increasing trace-event coverage from 40% to 100%.
Tech: Java, Go, Spring Boot, Kafka, Flink, Redis, RocksDB, AWS
- ▸Built a real-time DDoS detection pipeline using Kafka, stateful Flink, and Java on Kubernetes, ingesting 2M+ network flow records/sec and cutting detect-to-mitigate latency from 25s to under 8s.
- ▸Implemented a Go/Java mitigation orchestration service over gRPC to push staged BGP Flowspec, RTBH, and ACL actions with audit logging, operator approval, and rollback safeguards, automating 90%+ of approved mitigations.
- ▸Developed adaptive baselining and heavy-hitter detection with Count-Min Sketch, EWMA, RocksDB-backed Flink state, and Redis caching, reducing compute and memory footprint by 40% with no loss in precision or recall.
- ▸Drove observability and reliability with Prometheus, Grafana, OpenTelemetry, Splunk, SLO dashboards, and chaos drills, achieving 99.97% availability and reducing mean time to mitigation from 11s to 5s.
- ▸Implemented secure multi-tenant rollout workflows on AWS using EKS, S3, CloudWatch, IAM, KMS, Terraform, Argo Rollouts, and feature flags, enabling tenant-level isolation, staged releases, and safer rollback across client environments.
Projects
- ▸Built a FastAPI-based RAG service for PDF ingestion, semantic chunking, FAISS retrieval, context ranking, and grounded Q&A; added retrieval-quality checks that reduced irrelevant responses by 40% versus baseline generation.
- ▸Built a POSIX-style file system simulator in C++ with concurrent reads/writes, block allocation, inode-style metadata indexing, journaling, and crash recovery; improved throughput by 70% over a single-threaded baseline under contention.
Technical Skills
Education
Master of Science, Computer Engineering
Bachelor of Engineering, Electrical and Electronics Engineering