I came across a job ad titled Systems Reliability Engineer which turns out to be a sort of a hybrid engineer skillset. Its details are copied below. Bear with me while I break things down.
The hybrid part in this is that it requires a combination of:
– Linux/Unix/Virtualisation which falls generally under SysAdmin roles.
– Networking which falls under Network Engineer roles
– Storage and Server which generally falls under Storage/Backup Engineer
– Kubernetes which is a container orchestrator and will provide a platform for a distributed application. This is a new field but I think its safe to say that Devops Engineer or Platform Engineer role titles handle this responsibility.
– AWS/Azure/GCP Cloud which are Public cloud IaaS, PaaS or FaaS services. This falls under Cloud Engineer or Devops Engineer.
A combination of the above knowledge bank is required to function as a Systems Reliability Engineer here.
And so we can say that a Systems Reliability Engineer is composed of a SysAdmin, Network Engineer, Storage Engineer, Devops Engineer, Platform Engineer and Cloud Engineer.
Can we break this down a bit more?
Starting with the Application workload, suffice to assume that the heavy weight applications which this guy will support require a networked distributed system to run. They are cloud native microservices based applications requiring a networked distributed system to run. The application needs CPU cores, RAM, Storage, IOPS, Bandwidth at such a scale.
Digging in further it can be observed that the individual components require an OS and Virtualisation (Linux/Unix/KVM etc – Sysadmin). Networking these individual components require L2/L3 networks (Routers, Switches – NetEng) and further on what can be called a Distributed System OS is required which will present not individual components but the servers/OS/router/switches/vswitches/ storage combo to the application. Kubernetes can be said to be the Distributed System OS providing orchestration and management of namespaces/containers. A distributed file system and storage servers will also be present. Certain parts of the application may be interacting with public clouds (AWS/Azure/GCP) to run certain workloads on public cloud instead of on the local infrastructure.
Oh dear, what a combination of knowledge bank and skillset this person needs!
In Computer Science we work on the principle of Abstraction Layers where there are layers which have science and phenomenon within themselves and then they provide a function or service to another layer. And so the whole system is composed of multiple Abstraction Layers interacting with each other. In this case this Systems Reliability Engineer requires knowledge spanning multiple Abstraction Layers. Traditional engineers have been functioning within their own Abstraction Layer. Their specific jobs have been complicated enough to require tips and tricks of that same abstraction layer to make things work. An engineer working in the networking abstraction layer knows how to troubleshoot links, routing, SFPs etc and an engineer working the SysAdmin layer knows what to do with the Linux OS, KVM etc etc. Similarly an engineer working on the Public Cloud may actually know the tips and tricks of 1 or 2 public clouds and not all 3. Kubernetes and container management is itself now an Abstraction Layer.
This job advertisement not only lists multiple abstraction layers but even within them it lists multiple tools. For example within Virtualisation it lists KVM, ESXi and HyperV all 3 famous hypervisors and within Public Cloud it lists GCP, Azure and AWS all three. So not only does it span abstraction layers but even within abstraction layers it is asking for familiarity with multiple versions of software.
In IT Operations knowing the right command or the right place to click sometimes matters a lot. Things dont proceed if you dont know the command or dont know where to click or what parameter to enter. Spanning Abstraction Layers and multiple tools within Abstraction Layers is a tricky job for IT Operations. I am guessing they will have a team and will manage the skillset of the team and not individual engineers. Multiple engineers with basic knowledge of the system and specific knowledge of 1 or 2 Abstraction Layers and 2 or 3 tools. The team level skill set management would be an important aspect here.
The rest of the job description suggests this is an operations job as they required full work week availability and troubleshooting skills as well. So this new hybrid engineer will be tasked with on shift troubleshooting work supporting customers and speaking to vendors etc. It is important to note that this is not a Project Deployment or Professional Services job where you are reviewing designs, testing solutions, submitting BOMs, reviewing equiptment lists, counting item, installing systems and configuring systems from scratch. This is an Ops tshooting break-fix role. As such it requires a troubleshooting mindset and will require sufficient knowledge of the systems functions and the individual components to identify which part of the system is causing a bug or service impact. Once you identify which part is broken (eg networking or virtualisation) then you might need to dig a bit deeper and review some logs within that component to a certain level. Thereafter they will make an intelligent decision on either actions to fix the component or whom next to contact to fix the problem. Each individual component will have their own level 3 support structure and vendor and this Systems Reliability Engineer will identify whether networking is broken or virtualisation is broken or storage is broken etc etc. He will then attempt a certain level of fix and if not then consult the right team or vendor.
As such when we look at the multiple skill sets required it looks very very complicated for one person to know all this. From my experience of 13 years IT still ongoing we are still in a siloed world where possibly a network engineer with a ccnp is progressing towards senior network engineer and CCIE or maybe only diversifieng with an AWS or Azure skill. A comprehensive non-siloed cross abstraction layer engineer with kubernetes, storage, public cloud, networking, virtualization, linux knowledge will probably be difficult to find because from what I see a lot of people are comfortable within their abstraction layer and such diversity is not necessary and is a big headache. Within networking which is my field I feel that network engineers are probably proceeding with deeper design knowledge or AWS/Azure diversification or Python Network Automation knowledge as a career path. Same might be true for say engineers within the Virtualization / Sysadmin layer who might be developing inside that abstraction layer. Further tricky is the part that you need this cross abstraction layer engineer to have ops and troubleshooting mindset willing to do shifts on weekends. There will be few people out there. Perhaps some incentives might be required to find the right diverse engineer working weekends. Incentives like permanent work from home or any nearby country accepted working the right timezone etc.
These are the new Hybrid Engineers.
Update: I later came to know that they have mentioned that they require 2 or 3 out of the skill set. So it appears they aee dividing skillset on a team level.