View below or download a PDF copy.

Srividya Bandari

Software Engineer — Backend & Distributed Systems

Summary

Software Engineer with 4+ years of experience building distributed backend services, event-driven workflows, real-time streaming systems, and ML-assisted decision pipelines. Strong background in .NET Core, Java, Kafka, Pulsar, Flink, Redis, PostgreSQL, AWS/Azure, observability, and production reliability. Delivered measurable impact across high-volume systems, including 35% workflow latency reduction, 70% VIN decode p95 improvement, 45% faster issue detection, and stronger release safety.

Work Experience

Software Engineer · Openlane
July 2023 – Present  ·  Austin, TX

Tech: .NET Core, Apache Pulsar, Redis, PostgreSQL, Terraform, Honeycomb

  • Owned core services in a distributed .NET Core inspection workflow platform processing 300K+ daily vehicle inspections and 3.5M+ validation, integration, audit, and observability events/day across inspection, inventory, VIN intelligence, and downstream systems.
  • Designed Apache Pulsar-based event propagation with replay-safe consumers, correlation IDs, schema-versioned payloads, and failure-isolated retry paths, reducing workflow latency by 35% while improving real-time state consistency.
  • Built ML-assisted inspection decision pipelines by enriching workflow events with VIN intelligence, vehicle metadata, inspection signals, and historical state-transition features, generating risk scores used by rule engines and manual-review queues to detect anomalous transitions, duplicate processing signals, and inconsistent vehicle attributes.
  • Implemented idempotent event consumers with deduplication keys, exponential backoff, dead-letter handling, and vendor-failure isolation, reducing duplicate processing risk and preventing invalid inspection state transitions during downstream degradation.
  • Optimized high-volume VIN decoding and reference-data lookups using Redis-backed caching, TTL-based invalidation, and request coalescing, improving p95 latency by 70% while reducing repeated third-party vendor calls.
  • Led production rollout with Terraform-managed feature flags, staged enablement, canary deployments, rollback paths, and Honeycomb distributed tracing, reducing MTTD by 45% and improving release safety across critical inspection workflows.
Research Software Engineer Intern · Mayo Clinic
Jan 2022 – July 2022  ·  Phoenix, AZ

Tech: Python, PyTorch, TensorFlow, Multi-view CNNs, SageMaker, Datadog

  • Designed a modular multi-view CNN training and inference pipeline in PyTorch, fusing image embeddings with structured metadata features to support reproducible model development across medical-imaging datasets.
  • Improved model performance through class-imbalance handling, loss-function tuning, threshold calibration, and 1K+ hard-case error analysis, raising accuracy from 87% to 95% while reducing false-negative rate from 13% to 5%.
  • Added PyTorch hooks, tensor-level tracing, and Datadog dashboards for feature extraction, inference latency, and failure diagnostics, reducing MTTR from 2 days to 4 hours and increasing trace-event coverage from 40% to 100%.
Software Engineer · Infosys
Jan 2020 – Aug 2021  ·  Hyderabad, India

Tech: Java, Go, Spring Boot, Kafka, Flink, Redis, RocksDB, AWS

  • Built a real-time DDoS detection pipeline using Kafka, stateful Flink, and Java on Kubernetes, ingesting 2M+ network flow records/sec and cutting detect-to-mitigate latency from 25s to under 8s.
  • Implemented a Go/Java mitigation orchestration service over gRPC to push staged BGP Flowspec, RTBH, and ACL actions with audit logging, operator approval, and rollback safeguards, automating 90%+ of approved mitigations.
  • Developed adaptive baselining and heavy-hitter detection with Count-Min Sketch, EWMA, RocksDB-backed Flink state, and Redis caching, reducing compute and memory footprint by 40% with no loss in precision or recall.
  • Drove observability and reliability with Prometheus, Grafana, OpenTelemetry, Splunk, SLO dashboards, and chaos drills, achieving 99.97% availability and reducing mean time to mitigation from 11s to 5s.
  • Implemented secure multi-tenant rollout workflows on AWS using EKS, S3, CloudWatch, IAM, KMS, Terraform, Argo Rollouts, and feature flags, enabling tenant-level isolation, staged releases, and safer rollback across client environments.

Projects

RAG-Powered Document Q&A System
PythonFAISSFastAPIDockerVector Search
  • Built a FastAPI-based RAG service for PDF ingestion, semantic chunking, FAISS retrieval, context ranking, and grounded Q&A; added retrieval-quality checks that reduced irrelevant responses by 40% versus baseline generation.
Multi-Threaded File System Simulator
C++POSIX APIsRead-Write LocksJournalingConcurrency
  • Built a POSIX-style file system simulator in C++ with concurrent reads/writes, block allocation, inode-style metadata indexing, journaling, and crash recovery; improved throughput by 70% over a single-threaded baseline under contention.

Technical Skills

Languages
C#JavaPythonSQLGoJavaScriptTypeScript
Backend
ASP.NET Core/.NETSpring BootRESTgRPCMicroservicesDistributed Systems
ML & Data
PyTorchTensorFlowSageMakerFAISSSnowflake
Streaming & Messaging
Apache PulsarKafkaRabbitMQEvent-Driven Architecture
Databases & Caching
PostgreSQLSQL ServerRedisMongoDBElasticsearch/OpenSearch
Cloud & DevOps
AWSAzureDockerKubernetesTerraformCI/CDGitHub ActionsJenkins
Observability
HoneycombDatadogGrafanaPrometheusSplunkOpenTelemetry

Education

Arizona State University

Master of Science, Computer Engineering

Aug 2021 – May 2023
Tempe, USA
Osmania University

Bachelor of Engineering, Electrical and Electronics Engineering

Aug 2016 – Jan 2020
Hyderabad, India