Job Description
We are seeking a skilled and motivated Linux Site Reliability Engineer (SRE) to join our team.
The ideal candidate will have a strong background in Linux system administration, automation, and cloud infrastructure, with a passion for building reliable and scalable systems.
You will collaborate with development and operations teams to ensure our services are highly available, performant, and fault-tolerant.
Job description
Onboarding of New Customers:Ensure smooth deployment and operational readiness, document processes and provide initial support during the transition.System Administration: Manage, monitor, and optimize Linux servers in production and development environments.
Identify and resolve bottlenecks in application and system performance.Automation: Develop and maintain infrastructure automation using tools like Ansible, Terraform, or similar.
Creation and Maintenance of Hardening and Washing Script (Ansible).Performance Optimization: Diagnose and resolve performance bottlenecks at the OS, application, and network levels.
Analyze system demands and plan for scaling.Incident Management: Lead efforts to quickly resolve production incidents, conduct post-mortems, and implement solutions to prevent future occurrences.Scalability: Work on infrastructure scalability and reliability for high-traffic services.Collaboration: Partner with development teams to create CI/CD pipelines and integrate reliability practices into the development lifecycle.
Coordinate changes with Operation Teams.Security: Ensure system security through best practices in access control, patch management, and system hardening.Qualifications
Extensive experience with Linux distributions like RHEL, CentOS, or Ubuntu.Shell scripting (Bash, Zsh)System configuration and tuningPackage management (apt, yum, rpm)User and group managementFile system managementNetwork configuration (TCP/IP, DNS, firewall)Security best practices (e.g. hardening systems, user permissions)Programming languages (Python, Go, Ruby)Version control (Git)CI/CD pipelines (Jenkins, GitLab Ci/CD)Cloud platforms (AWS, GCP, Azure)Containerization management tools (Ansible, Puppet, Chef)Monitoring and alerting tools (Prometheus, Grafana, Nagios)Experience with automation frameworks and toolsKnowledge in automated installation of different Linux distributions (Kickstart, Preseed, Cloud-init)Knowledge of DevOps principles and practicesAbility to write scripts and tools to automate tasksNice to have/preferred skills and experience (not required)Exposure to high-availability architectures and disaster recovery strategies.
SecurityPenetration testingVulnerability scanningSecurity incident responseNetworkingLoad balancingVPNsNetwork protocols (TCP/IP, administration (MySQL, PostgreSQL)Big Data (Hadoop, Spark, Kafka)German language knowledge is an advantage.