Engineering Team Lead, SRE - Real-time Data
Posted 11 days 6 hours ago by Bloomberg
Engineering Team Lead, SRE - Real-time Data
Location: London
Business Area: Engineering and CTO
Ref #:
Bloomberg's Real-time Data group is responsible for distributing low-latency, high-volume financial data to users around the world. From equity prices to FX rates, our infrastructure handles over 60 billion messages per day from 370+ global exchanges, powering 375,000 Terminals and 3,000+ BPIPE clients across on-prem and cloud environments. The London Real-time Data SRE team plays a critical role in making this possible-developing the core services and tooling that ensure our systems are reliable, scalable, and observable.
The Opportunity
We're looking for an experienced engineering manager to lead a team of software and SRE engineers. This is a hands-on leadership role where you'll be accountable for both technical execution and people development. You'll shape the team's roadmap, drive production readiness, and grow a high-performing, collaborative engineering culture.
What You'll OwnYou'll lead a team that supports several key components of the Real-time Data platform:
- Configuration Delivery Services: Enables thousands of servers and BPIPE endpoints to "call home" and receive correct settings.
- Peer Discovery Infrastructure: Groups servers into discoverable clusters and provides tools to manage them.
- Observability and Monitoring Frameworks: Ensures we have high visibility across a vast estate of global infrastructure.
- Data Quality Tooling: UI and backend systems for diagnosing distribution issues across the real-time data network.
- Cross-team Reliability Work: You'll help improve the reliability of systems beyond the team's formal ownership.
- As the team's leader, you will manage the career growth, performance, and mentorship of software engineers
- Drive hiring, onboarding, and long-term team culture
- Stay hands-on: participate in technical design, and lead incident response when necessary
- You'll balance operational excellence with software development, helping your team deliver tools, services, and processes that scale with the business.
The team's mission aligns with five SRE pillars:
- Latency Monitoring & Management - Define SLIs/SLOs, track latency, and build tools to diagnose issues.
- Capacity Management - Maintain disaster readiness and scalability through monitoring and forecasting.
- System Observability - Proactively detect issues, build alerting systems, and centralize health dashboards.
- Production Risk Management - Ensure safe software releases, drive infrastructure improvements.
- Incident Response - Lead or support fast, effective remediation during live incidents; build automation for common operational issues.
We're seeking a leader who can combine strong technical execution with people-first leadership. You'll guide the team's roadmap, help individuals grow, and contribute to the broader reliability strategy across Real-time Data.
You'll need to have:
- Proven experience directly managing software engineers in a production environment
- Strong hands-on development skills in an object-oriented language-Python or C++ preferred
- A background in building reliable, well-tested software for production systems
- Confidence diagnosing and resolving live operational issues
- Strong communication skills-able to work across teams and influence peers
- A track record of helping teams plan, prioritize, and deliver complex technical projects
- The ability to define a long-term vision for the team's technology and culture
- Background in SRE, infrastructure, or high-throughput distributed systems
- Familiarity with observability tooling, configuration management, or peer discovery patterns