Job Overview:
We are looking for an experienced System Engineer with deep expertise in Linux systems, virtualization, and modern infrastructure technologies. The ideal candidate will have a solid track record in building and maintaining scalable, secure, and high-performance computing environments to support cutting-edge AI/ML workloads and hybrid infrastructures.
Key Responsibilities:
-
Design, implement, and optimize Linux-based infrastructure and systems operations.
-
Manage containerization and orchestration platforms, particularly Kubernetes.
-
Support and maintain hypervisor architectures, including VMware, Proxmox, KVM, and QEMU.
-
Enable and support AI/ML environments, with a focus on GPU configuration and performance tuning.
-
Administer and scale software-defined storage solutions, especially CEPH.
-
Build and manage infrastructure using Infrastructure as Code tools like Ansible and Terraform.
-
Collaborate with teams to ensure the stability, security, and efficiency of deployed infrastructure.
Required Experience:
-
Strong hands-on experience with Linux infrastructure setup and optimization.
-
Proven knowledge in virtualization technologies such as VMware, Proxmox, KVM, and QEMU.
-
Familiarity with software-defined storage platforms, with emphasis on CEPH.
-
Demonstrated expertise with AI/ML infrastructure, GPU support, and workload configuration.
-
Proficiency in container management and Kubernetes orchestration.
-
Experience using Ansible, Terraform, or other IaC tools for automated deployments.
Preferred Knowledge:
-
Exposure to Cisco UCS and Cisco fabric interconnect infrastructures.
-
Understanding of Hyperconverged Infrastructure environments.
-
Experience with NetApp storage solutions.
-
Basic Windows system administration.
-
Knowledge of cloud platforms and hybrid environments management (e.g., AWS, Azure, GCP).
Qualifications:
-
Bachelor’s degree in Computer Science, Engineering, or related technical field (or equivalent experience).
-
5+ years of relevant experience in infrastructure engineering roles.
Nice to Have:
-
Certifications in Kubernetes, Linux, or VMware.
-
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).