Staff Platform Engineer with 14+ years in distributed systems and cloud infrastructure. I design scalable, resilient, and cost-efficient platforms for data and compute workloads. At Rialtic I built a Go-based pipeline that processed millions of records daily (10× throughput gain) and engineered distributed batch systems on EKS. Before that I founded Shipmile, grew it to a 25-member team and $2M ARR, and built an event-driven backend handling 50k+ daily shipments. Earlier roles include Morgan Stanley (low-latency search over 1TB dataset at <5ms p99) and Meditab (leading a web-stack migration).
Side projects include Conv-SNN in Rust (96.7% accuracy, 2× efficiency vs ANN), a profitable RL-based trading engine, and a CUDA Bitcoin finder achieving 285× speedup. I enjoy building efficient distributed systems, performance profiling, and high-throughput data pipelines. Open to Staff+ roles at startups or scale-ups where infrastructure and system design are critical.
Senior Systems Engineer (14 YoE) applying systems expertise to ML. I don't build models;
I build the systems that make ML fast and cheap.
Recent work:
- Conv-SNN from scratch in Rust: 96.7% accuracy, 2x efficiency vs ANNs
- Profitable RL trading system running since Dec 2020
- 20x GPU speedup using custom CUDA kernels
Currently at Rialtic scaling data pipelines. Seeking Staff-level ML Platform/MLOps roles.
My experience is Claude is better at writing rust than Gemini. Gemini cli gets confused easily. But has good higher level picture. In my use case where Gemini is the architect, it even provides full code changes to claude (does not directly compile most of the time, misses move of structs) and claude makes those changes. It works better for me that way.
In the non oil producing countries, the cost of consumption adds to the fiscial deficit. They even have huge incentive to move away, but the alternative has to be cheaper or open IP. The producers who have a cash cow, have no reason to stop production even if they agree on the environmental issues.
PING 1.1.1.1 (1.1.1.1): 56 data bytes
64 bytes from 1.1.1.1: icmp_seq=0 ttl=61 time=15.860 ms
64 bytes from 1.1.1.1: icmp_seq=1 ttl=61 time=15.799 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=61 time=15.616 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=61 time=15.769 ms
64 bytes from 1.1.1.1: icmp_seq=4 ttl=61 time=15.431 ms
64 bytes from 1.1.1.1: icmp_seq=5 ttl=61 time=16.459 ms
64 bytes from 1.1.1.1: icmp_seq=6 ttl=61 time=15.860 ms
64 bytes from 1.1.1.1: icmp_seq=7 ttl=61 time=15.930 ms
We once maintained a hashmap with key and value as the instance of string to avoid duplication in a search application. Wouldn't that be more beneficial than keeping it in GC if the application uses more strings?
Edit: changed avoid deduplication to avoid duplication
Why did you want to avoid deduplication? You can't even tell it's happened as it only works on the char[] which is internal to the string. Did you find it didn't work as expected.
It was in a typeahead search application built on 20GB of names. These names have common first names and last names which were stored as different strings. With deduplication, string memory was reduced to 20%
Will benchmark that application with +UseStringDeduplication
Ah right. The reason they're doing it in the GC rather than in the mutator threads is that it only has an impact on strings long lived enough to be evacuated. Short lived strings don't get deduplicated, and probably don't need to be. Without the GC I don't know how you'd automatically determine that it was a good idea to deduplicate.
Remote: Yes
Willing to relocate: Open
Technologies: Rust, Go, Python, Distributed Systems, Kubernetes, AWS/GCP, CUDA, PostgreSQL, Terraform, Helm
Résumé/CV: PDF resume here: [https://drive.google.com/file/d/1r3xT_5YWZf0IcYyFLOlP5qdp0MB...]
GitHub: https://github.com/karthikkolli
LinkedIn: https://linkedin.com/in/karthikkolli
Email: karthik@karthikkolli.com
Staff Platform Engineer with 14+ years in distributed systems and cloud infrastructure. I design scalable, resilient, and cost-efficient platforms for data and compute workloads. At Rialtic I built a Go-based pipeline that processed millions of records daily (10× throughput gain) and engineered distributed batch systems on EKS. Before that I founded Shipmile, grew it to a 25-member team and $2M ARR, and built an event-driven backend handling 50k+ daily shipments. Earlier roles include Morgan Stanley (low-latency search over 1TB dataset at <5ms p99) and Meditab (leading a web-stack migration).
Side projects include Conv-SNN in Rust (96.7% accuracy, 2× efficiency vs ANN), a profitable RL-based trading engine, and a CUDA Bitcoin finder achieving 285× speedup. I enjoy building efficient distributed systems, performance profiling, and high-throughput data pipelines. Open to Staff+ roles at startups or scale-ups where infrastructure and system design are critical.