Back to all vacancies

Team Lead Database Reliability Engineering

Team Lead Database Reliability Engineering

Responsibilities

1. Team Leadership and Process Management (Operations & People Management) (70% of working time)

  • Team Leadership: Manage a small team of DBRE engineers (3-5 people): setting goals, task delegation, monitoring execution, development, and mentorship.

  • Operational Excellence: Establish, implement, and maintain effective operational processes, including Incident Management, Root Cause Analysis (Post-Mortems/RCA), and problem management.

  • Incident Management: Lead and coordinate actions during critical incidents, ensuring rapid communication, thorough documentation, and Post-Mortem execution.

  • Engineering Collaboration: Work closely with Development teams to ensure the operational readiness of new products (Production Readiness) and implement Shift-Left methodologies.

  • SRE Processes: Implement and enforce key SRE practices, such as SLI/SLO definition, Error Budget management, and automation of manual tasks (Toil Reduction).

2. Technical Expertise and Observability (30% of working time)

  • Monitoring and Alerting (Key Focus): Lead the configuration and optimization of monitoring and alerting systems. Ensure Actionable Alerting—alerts must signal user impact (SLI/SLO) rather than resource saturation issues.

  • On-Call Management: Manage alerting processes and rotations (On-Call Rotation), mitigating Pager Fatigue by continuously improving alert quality.

  • Cloud Platforms: Proven experience with cloud providers, preferably AWS, at a level sufficient to advise the team on technical decisions.

  • Containerization: Understanding and hands-on experience with Kubernetes (k8s)—deployment, administration, monitoring, and troubleshooting.

  • Automation: Programming experience in Python or Go for automating operational tasks.

  • Infrastructure as Code (IaC): Experience using Terraform for infrastructure and configuration management.

Qualifications

Experience

  • Minimum of 2-3 years of proven experience in a Team Lead or similar leadership role related to operations in DBRE, DBA, SRE positions;

  • Experience managing operational processes and production incident resolution workflows.

  • Experience operating mission-critical, high-availability systems.

Technical Skills

  • MySQL DB: Solid understanding of building reliable clusters of these databases, covering sharding, replications, backups, performance, and queries tuning. Experience in building efficient monitoring, alerting, reporting dashboards and databases capacity management. 

  • MongoDB: Administrator experience, with a solid understanding of reliable configuration with efficient performance tuning for this kind of databases. Understanding of Cloud Hosted solutions, such as DocumentDB, process of migration and its pros/cons

  • Kubernetes (k8s): Proficient command.

  • AWS: Experience with key cloud compute services and advanced experience with AWS managed database services, especially with MySQL. Understanding of costs management process of AWS services; 

  • IaC: Experience with Terraform.

  • Programming: Development skills in Python/Go (Middle Engineer level).

  • Monitoring: Deep experience configuring systems (Prometheus, Grafana, Datadog) and implementing Observability practices (metrics, logs, traces).

Personal and Leadership Qualities

  • Management Skills: Ability to inspire, motivate, and develop the individual competencies of engineers. Provide feedback, set clear expectations for engineers; Understanding of project management basics;

  • Communication: Excellent written and verbal communication skills for interacting with technical teams, management, and leadership. Understanding of cultural differences and the ability to unite people distributed around the world;

  • Systemic Thinking: Ability to see the big picture, analyze complex operational issues, and propose long-term, scalable solutions.

We offer:

  • Well-coordinated professional team

  • Cutting edge technologies, interesting and challenging tasks, dynamic project, great opportunities for self-realization, professional and career growth

  • Additional Health and Life Insurance Package

  • Employee Assistance Program

  • 25 vacation days

  • ReBenefit Platform Account.

  • This role requires on-site presence at our office 4 days a week to support effective collaboration and teamwork.

Write to us at jobs@jettycloud.com or send a message to our recruiters

We use cookies to analyze data.

If you keep using this website, it means that you agree to accept our cookies.
In case you don't agree to do that, check your browser settings or leave jettycloud.com.