Back to all vacancies

Site Reliability Engineer (Observability)

Site Reliability Engineer (Observability)

We are looking for a Site Reliability Engineer (SRE) to join the RingCentral Operations Intelligence team.

As an SRE, you will be responsible for maintaining, improving, and scaling our monitoring and observability infrastructure. Our systems provide operational insights across metrics, traces, logs, and alerts, ensuring the reliability, performance, and availability of RingCentral’s global platform.

You’ll collaborate closely with engineering and operations teams to automate deployments, improve observability, and strengthen system resilience across our Kubernetes and cloud environments.

This position combines software engineering, infrastructure management, and monitoring expertise to deliver a high-quality, data-driven operational experience.

The ideal candidate should have a background in cloud operations and monitoring technologies such as ELK/EFK, Prometheus, and VictoriaMetrics, as well as experience with containerization using Kubernetes, message brokers like Kafka, and SQL/NoSQL databases. Programming experience is required for the position.

Responsibilities:

  • Maintain the availability, reliability, and scalability of the global monitoring and logging infrastructure.

  • Integrate and evolve observability tools (ELK, Grafana, ClickHouse, VictoriaMetrics, Prometheus).

  • Develop automation and deployment pipelines for Kubernetes-based monitoring components.

  • Collaborate with engineering teams to embed observability and alerting into the development lifecycle.

  • Perform capacity planning and proactively manage scaling in large distributed environments.

  • Participate in global incident response and on-call rotation.

  • Maintain accurate and clear documentation of systems, configurations, and procedures.

 Skills:

  • 2+ years of experience as an SRE, Systems Engineer, or DevOps Engineer in production environments.

  • Strong Kubernetes experience (deployments, DaemonSets, ConfigMaps, Helm, ArgoCD, Terraform, etc.).

  • Hands-on experience with Fluent Bit, ELK, or similar log pipelines.

  • Solid understanding of observability and monitoring principles (metrics, logs, tracing).

  • Proficiency in Python or Go for automation and systems integration.

  • Strong Linux administration and troubleshooting skills.

  • Experience with AWS or other major cloud providers.

  • Familiarity with CI/CD tools (GitLab CI, Spinnaker) and infrastructure as code (Terraform, Ansible).

  • Excellent communication skills and ability to work in distributed teams.

Preferable technology stack: 

  • OS: Linux (CentOS/RedHat/Oracle Linux)

  • Programming: Python, Go

  • Cloud: AWS

  • Containers: Kubernetes, Docker

  • Logging & Monitoring: ELK, Fluent Bit, Prometheus, Grafana, CloudWatch, ClickHouse, VictoriaMetrics

  • Configuration Mgmt: Terraform, Ansible, ArgoCD, Spinnaker

  • Databases: ClickHouse, MongoDB, PostgreSQL, MySQL

  • HA: HAProxy, Keepalived

Qualification: 

  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience).

  • Strong analytical, problem-solving, and ownership mindset.

  • Intermediate English.

We offer:

  • Well-coordinated professional team

  • Cutting edge technologies, interesting and challenging tasks, dynamic project, great opportunities for self-realization, professional and career growth

  • Additional Health and Life Insurance Package

  • Employee Assistance Program

  • 25 vacation days

  • ReBenefit Platform Account.

  • This role requires on-site presence at our office 4 days a week to support effective collaboration and teamwork.

Write to us at jobs@jettycloud.com or send a message to our recruiters

We use cookies to analyze data.

If you keep using this website, it means that you agree to accept our cookies.
In case you don't agree to do that, check your browser settings or leave jettycloud.com.