Site Reliability Engineer

  • C/C++
  • Java
  • Bash/Perl
  • Unix/Linux
  • Python

We are looking for a Site Reliability Engineer (SRE) to be involved in improving availability, latency, performance, efficiency, monitoring and capacity planning across different teams and areas in the company, as we continue to work on the next-generation and world-leading online gaming platforms.

  • Investigating and fixing performance/resilience issues
  • Conducting capacity planning and reviewing the production estate for major events
  • Reviewing support, technical debt and technical directive tickets
  • Reviewing development changes where the change will have an impact on the performance of an application
  • Conducting and supporting performance tests on multiple environments
  • Generating performance reports and statistics
  • Attending analysis/design meetings to provide input/sign-off on technical solutions for projects. For some technical changes, this may involve carrying out the full analysis
  • Coaching and supporting other team members on how to develop resilient and high-performance software
  • Resolving or providing assistance to the support team on critical production issues. Providing and maintaining solutions for improving real-time monitoring and alerting, e.g. Grafana, Prometheus
  • Working closely with the customer to evidence and promote the success of the role and drive improvements
  • Leading the dev teams' approach and strategies for supporting major events
  • Will occasionally require some out-of-hours working
  • Support other SREs in the company by sharing best practices and improvements achieved in other teams previously
  • Interact with different customer stakeholders directly to understand their needs and goals for the future around stability and performance
  • Talent coding in at least one major language (Java, J2EE, C, C++, Python, PHP)
  • Strong experience in developing software and operating high volume transactional systems
  • Strong knowledge of relational database design and operation (any vendor)
  • Experience with database performance tuning: Query plans, locks, query optimizer directives
  • Strong knowledge of provisioning, configuration management, and application-deployment software (mainly Ansible)
  • Excellent communication skills, both written and spoken
  • Capacity to consider different solutions to complex problems and associated trade-offs
  • Ability to occasionally provide services out of normal working hours
  • Experience of applications operating on Unix/Linux
  • Scripting skills: Bash, Perl, Python, JavaScript
  • Experience with Continuous Integration and Delivery
  • Ability to learn and critique new technologies quickly
  • Operation and development of performance testing solutions as well as analysis of results
  • Experience working with Docker containers

Still don’t find a job position of your choice?
Job Listing

e-Zest is a leading digital innovation partner for enterprises and technology companies that utilizes emerging technologies for creating engaging customers experiences. Being a customer-focused and technology-driven company, it always helps clients in crafting holistic business value for their software development efforts. It offers software development and consulting services for cloud computing, enterprise mobility, big data and analytics, user experience and digital commerce.