We are looking for a Site Reliability Engineer (SRE) to join the RingCentral Operations Intelligence team.
As an SRE, you will be responsible for maintaining, improving, and scaling our monitoring and observability infrastructure. Our systems provide operational insights across metrics, traces, logs, and alerts, ensuring the reliability, performance, and availability of RingCentral’s global platform.
You’ll collaborate closely with engineering and operations teams to automate deployments, improve observability, and strengthen system resilience across our Kubernetes and cloud environments.
This position combines software engineering, infrastructure management, and monitoring expertise to deliver a high-quality, data-driven operational experience.
The ideal candidate should have a background in cloud operations and monitoring technologies such as ELK/EFK, Prometheus, and VictoriaMetrics, as well as experience with containerization using Kubernetes, message brokers like Kafka, and SQL/NoSQL databases. Programming experience is required for the position.
Responsibilities:
Maintain the availability, reliability, and scalability of the global monitoring and logging infrastructure.
Integrate and evolve observability tools (ELK, Grafana, ClickHouse, VictoriaMetrics, Prometheus).
Develop automation and deployment pipelines for Kubernetes-based monitoring components.
Collaborate with engineering teams to embed observability and alerting into the development lifecycle.
Perform capacity planning and proactively manage scaling in large distributed environments.
Participate in global incident response and on-call rotation.
Maintain accurate and clear documentation of systems, configurations, and procedures.
Skills:
2+ years of experience as an SRE, Systems Engineer, or DevOps Engineer in production environments.
Strong Kubernetes experience (deployments, DaemonSets, ConfigMaps, Helm, ArgoCD, Terraform, etc.).
Hands-on experience with Fluent Bit, ELK, or similar log pipelines.
Solid understanding of observability and monitoring principles (metrics, logs, tracing).
Proficiency in Python or Go for automation and systems integration.
Strong Linux administration and troubleshooting skills.
Experience with AWS or other major cloud providers.
Familiarity with CI/CD tools (GitLab CI, Spinnaker) and infrastructure as code (Terraform, Ansible).
Excellent communication skills and ability to work in distributed teams.
Preferable technology stack:
OS: Linux (CentOS/RedHat/Oracle Linux)
Programming: Python, Go
Cloud: AWS
Containers: Kubernetes, Docker
Logging & Monitoring: ELK, Fluent Bit, Prometheus, Grafana, CloudWatch, ClickHouse, VictoriaMetrics
Configuration Mgmt: Terraform, Ansible, ArgoCD, Spinnaker
Databases: ClickHouse, MongoDB, PostgreSQL, MySQL
HA: HAProxy, Keepalived
Qualification:
Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience).
Strong analytical, problem-solving, and ownership mindset.
Intermediate English.
We offer:
Well-coordinated professional team
Cutting edge technologies, interesting and challenging tasks, dynamic project, great opportunities for self-realization, professional and career growth
Additional Health and Life Insurance Package
Employee Assistance Program
25 vacation days
ReBenefit Platform Account.
This role requires on-site presence at our office 4 days a week to support effective collaboration and teamwork.
