Key Responsibilities

Reliability and Performance:

  • Design, implement, and maintain systems and processes that enhance the reliability, availability, and performance of our services.
  • Design, implement and maintain CICD tools and processes to increase reliability
  • Design, implement and maintain cloud constructs to increase reliability
  • Develop and manage monitoring, alerting, and incident response strategies to minimize downtime and ensure rapid recovery from incidents.
  • Conduct root cause analysis of system failures and implement preventative measures.
  • Optimize system performance and automate repetitive tasks to improve operational efficiency..

Collaboration and Communication:

  • Work closely with software engineering, infrastructure, and product teams to integrate reliability practices into the development lifecycle.
  • Advocate for SRE best practices and foster a culture of reliability and operational excellence across the organization.
  • Communicate effectively with stakeholders, providing regular updates on reliability metrics, incidents, and improvement initiatives.

Innovation and Improvement:

  • Stay abreast of the latest industry trends and technologies in SRE, reliability, and performance.
  • Continuously evaluate and improve existing systems and processes to enhance reliability and efficiency.
  • Drive the adoption of new tools and technologies that can improve operational capabilities.

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • 5+ years of experience in site reliability engineering, DevOps, or a related field
  • Strong understanding of reliability engineering principles, practices, and tools.
  • Proficiency in monitoring and alerting tools (e.g., Prometheus, Grafana, Nagios).
  • Experience with cloud platforms (AWS, Azure, GCP) and container orchestration systems (Kubernetes, Docker).
  • Proficiency in scripting and automation tools, such as Python, Bash, Ansible, or Terraform.
  • Excellent problem-solving skills and the ability to work under pressure in a fast-paced environment.
  • Strong communication and interpersonal skills, with the ability to influence and lead teams.

Preferred Qualifications

  • Experience with continuous integration and continuous deployment (CI/CD) practices and tools.
  • Knowledge of configuration management tools (e.g., Puppet, Chef).
  • Experience with database management and optimization.
  • Familiarity with compliance frameworks and security best practices.
  • Relevant certifications such as AWS Certified DevOps Engineer, Google Professional SRE, or equivalent.

Salary

198,000 - 220,000 USD

Yearly based

Remote Job

Worldwide

Job Benefits
Company retreats Paid time off Pay in crypto
Job Overview
Job Posted:
1 month ago
Job Expire:
1mo 1w
Job Type
Full Time
Job Role
Engineer
Education
Bachelor Degree
Experience
5+ Years
Total Vacancies
1

Share This On:

Location

United States