This is a copy of a previous Linkedin Post Dated June 7 2016 which was not present on this Blog.

https://www.linkedin.com/pulse/opnfv-brahmaputra-systems-integration-nfv-vnfs-lutfullah-kakakhel/

OPNFV Brahmaputra is a Lab ready release of OPNFV. One statement is that community driven Systems Integration really is a hard task to accomplish. This becomes especially true if the systems being integrated to form a larger system are actually multiple large open source projects themselves.

To start with OPNFV aims to integrate systems upon which VNFs can be run.

The caption above is heavy. On the one side there is the requirements generating standards bodies block of organizations which produce specifications and define how the system is to run. On the other side there are the code producing development projects which produce open source projects. OPNFV stands in the middle and intends to integrate these individual code projects according to the requirements laid out by the standard bodied and provide a system on top of which VNFs can be run and tested. The reason this task is being run under an umbrella membership based organization such as OPNFV is because it is a repetitive task which every organization will need to do over and over again as soon as new releases of codes are made available for the individual projects.

It might be difficult to picture this to start with but imagine you want to have a lab ready to run and test VNFs. What is the lab composed of? It will have Infrastructure on top of which VNFs can be run. What is this Infrastructure composed of? This Infrastructure will be composed of hardware and a virtualisation layer and hypervisors and networking projects such as OpenDaylight and Openstack and KVM and Ceph all running together to provide a block of Infrastructure virtual compute network and storage (An NFVI Point of Presence) on which VNFs can be run.

Every organization which wants to reach the level of testing VNFs will need such a lab. And then what happens when a new version of OpenDaylight is released or a new version of Openstack is released or KVM or Ceph? Everybody needs to update their labs. OPNFV is a Linux Foundation project which intends to be the focal point of these activities and perform them jointly instead of everybody doing them individually.

It also helps make the system work. A patch to OpenDaylight could work well within OpenDaylight but could break things at System layer when integrating with the rest of the components which make an NFV lab (to be used to runs VNFs). OPNFV aims to be the first systems layer at which point such patches can be spotlighted and returned to the project they came from informing them that at the system levels things get disjointed.

OPNFV according to its initial white paper aims to make this systems testing environment in line with the NFV Architecture References points of Vi-Ha, Vn-Nf, Nf-Vi, Vi-VNFM & Or-Vi.

After the above is clear the figure below can be understood to be a larger system composed of individual projects integrated together with the aim of running VNFs. In the figure below OpenDaylight is one piece (in network), KVM is another piece (in compute), Openvswitch is another piece and Openstack is also one piece. All these when put together provide the infrastructure to run VNFs. Also to be noted is that in the case of OPNFV there are community labs (Pharos Labs) which provide the hardware.

The presence of this combined effort also means that for Network Operators the differentiation in the market is in Service Orchestration. The Virtual Network Functions and the Network Services run on top of them.

 

References:

http://www.etsi.org/technologies-clusters/technologies/nfv

http://www.slideshare.net/CiscoDevNet/devnet-1162-opnfv-the-foundation-for-running-your-virtual-network-functions

https://www.opnfv.org/brahmaputra

http://www.slideshare.net/OPNFV/opnf-brahmaputra-an-early-look

https://www.opnfv.org/sites/opnfv/files/pages/files/opnfv_whitepaper_092914.pdf

https://www.youtube.com/watch?v=Dh55McgHGQ8

This is a copy of a previous Linkedin Post Dated June 7 2016 which was not present on this Blog.

https://www.linkedin.com/pulse/nfv-mwc-2016-syed-habib-lutfullah-kakakhel/

ETSI showcased a practical implementation of NFV at the Mobile World Congress 2016. They showed the whole NFV Architecture being implemented and run to provide a SIP voice call. An end to end communication service of a SIP call was made based on a vIMS platform. This vIMS is an NFV VNF orchestrated by a NFV Orchestrator run on top of Infrastructure controlled by an Openstack based VIM. Let’s see the components and how they made the NFV based SIP voice call.

There are two NFVI PoPs (Points of Presence) or two VIMs. One is Openstack controlled and the other is controlled by openvim (part of OpenMano package). Both are controlled by the Open Mano NFVO for resource orchestration. The Service Orchestration is performed by Ubuntu’s Juju.  The launchpad of Rift.io is used as triggering mechanism for resource orchestration and service orchestration. 6wind provides the PEs showcasing corporate VPN interconnectivity. Telefonica provides the traffic generator to test the bandwidth capacity of the PE links and Metaswitch provides the VNF vIMS Clearwater for being run atop the infrastructure.

The figure below shows details:

A multi-site corporation’s network is shown to be running connected via 3 PEs. One site which is connecting to PE 3 has the VNF deployed in VIM2 which is another Data Center. One NFVI PoP labelled VIM 1 is hosting the 6wind PEs while the second NFVI labelled VIM 2 is hosting the VNF. There is interDC communications going on between the two NFVI PoPs. The figure below shows the SIP voice calls communication logical path. The IMS protocols SIP signaling is implemented in VIM 2 in the Metaswitch Clearwater vIMS.

More details can be seen here.

ETSI’s new initiative is delivering an open source NFV Management and Orchestration software stack which is set take away attention from the MANO and turn it into a given piece of software. This puts more focus on the VNFs. The message could be that Service Orchestration using VNFs are therefore to be the focus of attention for Telco organizations.

References:

https://osm.etsi.org/

http://www.etsi.org/technologies-clusters/technologies/nfv

https://networkbuilders.intel.com/docs/E2E-Service-Instantiation-with-Open-Source-MANO.pdf

https://www.youtube.com/watch?v=JJlxwJStkTk

This is a copy of a previous Linkedin Post Dated June 7 2016 which was not present on this Blog.

https://www.linkedin.com/pulse/nfv-telco-vepc-solutions-syed-habib-lutfullah-kakakhel/

In telecom networks the option to place an LTE vEPC stands out as an exemplary demonstration of NFV’s application. The figure below gives the generic NFV Architecture. It be divided into 3 main sections:

  1. The Management and Orchestration
    1. Consists of NFV Orchestrator, VNF Manager and Virtualization Infrastructure Manager.
  2. The NFVI – NFV Infrastructure
    1. Consists of Hardware, Virtualization Layer and Compute, Storage, Network Virtualization Software
  3. The VNFs – i.e. the virtual network functions.

The type of function the VNF provides shows what this NFV network delivers. That is if the NFV network delivers as a Network Service an LTE Core end to end communication then there will be EPC functionality implemented and provided by the VNF part of the NFV network.

See the figure below from an ETSI Proof of Concept work.

It shows the vendor CYAN providing NFV Orchestrator (NFVO) and VNF Manager (VNFM). Redhat and Openstack provide the Virtualized Infrastructure Manager (VIM). The figure also shows the relevant Infrastructure hardware and hypervisor software solutions. Finally it shows the VNFs as being Connectem’s vEPC. If the VNF was implementing a different functionality say it was a vIMS then the rest of the components in the figure could be the same and the end to end Network Service being provided by the NFV network would be different. Therefore the function implemented inside the VNF defines what service the NFV network provides. Therefore the work done by the VNF decides whether your NFV network is Telco or Enterprise; LTE or WiMAX; LTE or 3G etc.

For a list of possible Telco VNF’s see the figure below.

Every chuck of functional blocks could be implemented together as a VNF. So what Connectem is doing is implementing the LTE MME, SGW, PGW, HSS, PCRF functionality and packaging it as a VNF which can be run atop virtualized infrastructure. In the ETSI Proof of Concept, Connectems solution therefore does this according the NFV specifications so that the VNF can be managed by a VNFM and its infrastructure is composed so as it can be managed by a VIM and all this can be controlled and coordinated by an NFVO i.e. NFV Orchestrator. Therefore you get an LTE EPC functionality inside the virtualized NFV environment.

The ETSI definition of a VNF is that a VNF is “a Network Function capable of running on NFV Infrastructure (NFVI) and being orchestrated by a NFV Orchestrator (NFVO) and VNF Manager”.

Coming back to our vEPC example the VNF has components. These VNF Components (VNFCs) can logically be pictured as below:

ETSI mandates that a vendor can choose to implement components as they wish inside the VNF environment as long as they speak to the other NFV architecture components as per their defined VNF interfaces. This means that the different components can utilize efficient compute storage and networking procedures instead of the standards body defined communication methodology. An example is that inside the vEPC software the MME will communicate with the SGW but will utilize efficient computational methodology instead of the 3GPP defined interfaces. If for some reason (say a lab environment) a vendor chooses to implement the 3GPP interfaces inside their vEPC it won’t be as fast and as efficient but it can be used to showcase 3GPP communications inside NFV.

Good VNFCs software design is what will distinguish different providers of vEPC software solutions.

References:

http://www.etsi.org/technologies-clusters/technologies/nfv

https://www.opennetworking.org/images/stories/sdn-solution-showcase/germany2015/CENGN%20-%20NFV-based%20LTE%20Core%20in%20the%20Cloud.pdf

http://nfvwiki.etsi.org/images/NFVPER%2814%29000010r2_NFV_ISG_PoC_Proposal-E2E_vEPC_Orchestration.pdf

This is a copy of a previous Linkedin Post Dated May 16 2016 which was not present on this Blog.

https://www.linkedin.com/pulse/nfv-mano-management-orchestration-syed-habib-lutfullah-kakakhel/

MANO is the brain of the NFV Network. It is the part of the network through which control operations are performed on virtual network functions and virtual network functions infrastructure.

One set of v-eNB, vMME, vSGW, vPGW, vPCRF can be assumed to be a Network Service. Each of the above v’s provide distinct Network Functions which with the v’s are deployed as Virtual Network Functions on Virtual Network Functions Infrastructure. The Virtual Network Functions Infrastructure is hardware with the virtual abstraction layer providing virtualization. These are the acronyms.

Multiple virtual network functions are connected together, or chained together, to provide a network service. The physical links are in the infrastructure which is the compute/storage hardware equipment while the logical links are among the VNFs. The endpoint is the Network Service endpoint which is providing service to the end devices.  Between the physical links and the logical links sits the virtualization layer.

The NFVO i.e. the NFV Orchestrator is the part of the network which controls the deployment and operations of virtual network functions.

This is a copy of a previous Linkedin Post Dated May 24 2016 which was not present on this Blog.

https://www.linkedin.com/pulse/nfv-independance-from-hardware-lock-in-lutfullah-kakakhel/

NFV is simple. It’s most simplistic distinction is that it is the Telecom Operators name for hardware independence and software dependence. Hardware is locked in while software is more easily changed (a project manager would say: relative to hardware that is).

We can try to see what problem NFV seeks to solve.

Telecom operators faced a dilemma about hardware. “To launch a new network service often requires yet another variety (of hardware) and finding the space and power to accommodate these boxes is becoming increasingly difficult; compounded by the increasing costs of energy, capital investment challenges and the rarity of skills necessary to design, integrate and operate increasingly complex hardware-based appliances.”

The sentence starts with “to launch a new network service often requires yet another variety” (of hardware). Remember they want to compete with the Whatsapp’s and Viber’s of tomorrow and need agility of deployment.

NFV seeks to provide that ‘Agility of Deployment’ of new network services to Network Operators by taking away dependency on proprietary and vendor locked in hardware. That is the high level purpose.

The rest is architecture. Hardware can be any compute(r) node with associated storage (types) and an accompanied (inter)network of such devices.  Then it follows to make virtual services; Virtual Network Functions with Virtual Network Infrastructure.

To roll out software or new software for a new service is easier than to roll out hardware.

Another primary benefit is elasticity in energy consumption. Energy consumption according to demand. With more control of hardware, which is the energy consuming physical device, via dependence on software this is made possible.

Providing Layer 2 VPN and Layer 3 VPN services has been a requirement of enterprises from Service Providers. Similarly Data Center networks need to provide Layer 2/3 Overlay facility to applications being hosted.

EVPN is a new control plane protocol to achieve the above . This means it coordinates the distribution of IP and MAC addresses of endpoints over another network. This means it is has its own protocol messages to provide endpoint network addresses distribution mechanism. In the Data Plane traffic will be switched via MPLS Labels next hop lookups or IP next hop lookups.

To provide for a new control plane with new protocol messages providing new features BGP has been used. So it is BGP Update messages which are used as the carrier for EVPN messages. BGP connectivity is first established and messages are exchanged. The messages exchanged will be using BGP and in them EVPN specific information will be exchanged.

The Physical layer topology can be a leaf spine DC Clos fabric of a simple Distribution/Core setup. The links between the nodes will be Ethernet links.

One aspect of EVPN is that the terms Underlay and Overlay are now used. Underlay represent the underlying protocols on top of which EVPN runs. These are the IGP (OSPF,ISIS or BGP), and MPLS (LDP/SR).  The underlay also includes the Physical Clos or Core/Distribution topology which has high redundancy built into it using fabric links and LACP/LAGs. The Overlay is the BGP EVPN vitual topology itself which uses the underly network to build a virtual network on top. It is the part of the network which related to providing tenant or vpn endpoints reachability. i.e. MAC address or VPN IP distribution.

It’s a new protocol and if you look at the previous protocols there is little mechanism to provide all active multihoming capability. This refers to one CE being connected via two links to two PEs and both links being active and providing traffic path to far end via ECMP and Multipathing. 2 Chassis multichassis lag has been one option for but it is proprietary per vendor and causes particular virtual chassis link requirement limits. Ingress PE to multiple egress PE per flow based load balancing using BGP multipathing is also newly enabled by EVPN.

There is also little mechanism in previous generation protocols to provide efficient fabric bandwidth utilization for tenant/private networks over meshed-style links. Previous protocols provide single active and single paths and required LDP sessions and tunnels for full mesh over a fabric. MAC learning in BGP over underlay provides this in EVPN.

Similarly there is no mechanism to provide workload (VM) placement flexibility and mobility across a fabric. EVPN provides this via Distributed Anycast Gateway.

 

Two edge computing specifications are present.

Facebook OCP’s CG-OpenRack-19 and LinkedIn’s Open19.

They provide for Rack Layouts, Compute, Storage and Networking.

Networking for CG-OpenRack-19 is copied below. The servers sleds in the pictures appear to be single homed as per the colors. It would be interesting find out which protocol handles the Active Active state of the multi homed Compute and Storage Sleds if that is at all present.

OpenRack-19

Given here

Open19 gives 100G bandwidth capabilities and some details are on its website.

5G’s edge ultra low latency requirements would could require edge solutions and it would be interesting to see how things play out ahead.

This also brings to mind SD-WAN because these edge racks will be at least connected in a large WAN.

Google’s B4 is one of its software defined inter data denter WAN solution. Google’s Espresso is its peering edge solution. Espresso links into B4 domain via B2. This link has the details of Espresso as shared by the Google team.

 

Google-Espresso-B4.JPG

Google is not employing an army of networking engineers to run these because they are software defined and programmed bots will probably be doing operational tasks. To operate this network there are Site Reliability Engineers though.

Here is one public job advertisement that relates as to what an SRE is expected to be like:

We have reliable infrastructure and can spin up new environments in a couple of hours. Automate everything so there is more time for exploring and learning. Foster the DevOps mindset

What are our goals?
  Internationalisation
  Deploying multiple data centers
  Deploying every 5 minutes
Requirements
  Experience with Java or JavaScript in a Dockerised environment
  Linux Engineering/Administration
  Desire for improving processes
  Have a passion and most importantly, a sense of humour
Tech Stack (you DO NOT need experience in all of these)
  Kubernetes + Docker
  Terraform + Ansible
  Linux
  Kotlin + NodeJS
  ELK stack
  AWS

This is obviously an SRE for the servers side and the application enablement side of things. If there is a large software defined edge network like Espresso and a large Edge-to-DC network like B2 and a large software defined inter-DC network like B4 you will need a different SRE.

Here is Google’s version of a Site Reliability Engineer Job.

Job description
Minimum Qualifications

BS degree in Computer Science or related technical field involving coding (e.g. physics or mathematics), or equivalent practical experience.
3 years of experience working with algorithms, data structures, complexity analysis and software design.
Experience in one or more of the following: C, C++, Java, Python, Go, Perl or Ruby.

Preferred Qualifications

Systematic problem-solving approach, coupled with effective communication skills and a sense of ownership and drive.
Interest in designing, analyzing and troubleshooting large-scale distributed systems.
Ability to debug and optimize code and automate routine tasks.

About The Job

Hope is not a strategy. Engineering solutions to design, build, and maintain efficient large-scale systems is a true strategy, and a good one.

Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google’s services—both our internally critical and our externally-visible systems—have reliability and uptime appropriate to users’ needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance.

SRE is also a mindset and a set of engineering approaches to running better production systems—we build our own creative engineering solutions to operations problems. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. As SREs are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to solve a broad spectrum of problems. Practices such as limiting time spent on operational work, blameless postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting and dynamic day-to-day work.

We can see that Google’s SRE Job Ad is all software along with large scale distributed systems requirements.

Now if we note this extract from the Wikipedia SD-WAN article:

“With a global view of network status, a controller that manages SD-WAN can perform careful and adaptive traffic engineering by assigning new transfer requests according to current usage of resources (links). For example, this can be achieved by performing central calculation of transmission rates at the controller and rate-limiting at the senders (end-points) according to such rates”

and we also note this extract:

“As there is no standard algorithm for SD-WAN controllers, device manufacturers each use their own proprietary algorithm in the transmission of data. These algorithms determine which traffic to direct over which link and when to switch traffic from one link to another. Given the breadth of options available in relation to both software and hardware SD-WAN control solutions, it’s imperative they be tested and validated under real-world conditions within a lab setting prior to deployment.”

We see Algorithms.

Its clear that there are different algorithms running these Software Defined networks (Google’s software defined Espresso, B2, B4 and Jupiter). These algorithms automate, kick in and optimize. Google becomes a large scale distributed system with various algorithms here and there. While Software Architects and Software Engineers will have developed these algorithmic nodes and programmed them into network devices/servers an SRE is the human who will operate the system. A team of SREs.

One aspect of Networking protocols is that they are for a multi-vendor, multi-enterprise and multi-domain environments. They provide simple consensus to connect two or more different network devices.

To take a merchant silicon network device like OCP’s Wedge and OCP style servers and make one large network like Google out of it will require software engineering to remake the NOS (Network Operating Systems) part at least. There will be atleast a Meta-NOS, somewhat running on top of a typical NOS which would handle the SDN – software defined algorithms. In addition to the SDN controllers talking to this Meta-NOS. Multiple layers of SDN controllers will be talking to each other and you can call this a network protocol or an SDN algorithm but it will be part of distributed systems software architecture and it will be programmed in place by software engineers.

Large Scale Distributed System on Merchant Silicon Hardware – Software Defined Meta-NOS – SDN Controllers – Hierarchical SDN Controllers – Algorithms.

Sounds like a Program Management task instead of PMP scale Engineering Project Management task. You will need Mathematicians to sit with Network Architects, Distributed Systems Architects and Software Architects. The Mathematicians will do give the algorithms. They will be important too.

Fun times.

Terabit scale networking requires better Consensus.

Autonomous Networks and Autonomic Networking can be renamed as solving Consensus Dynamics.

Wikipedia States (Nov’ 2018):

“Consensus dynamics or agreement dynamics is an area of research lying at the intersection of systems theory and graph theory. A major topic of investigation is the agreement or consensus problem in multi-agent systems that concerns processes by which a collection of interacting agents achieve a common goal. ”

To note again it is an ‘intersection of systems theory and graph theory’.

Lets not forget that mathematically communications networks are Graphs. An OSPF/ISIS network is a weighted directed graph where the costs & metrics are the weights, the network devices are vertices and the ethernet L2 links are directed edges.

Furthermore, to note again that ‘ a collection of interacting agents achieve a common goal’. In networks the common goal can be to enable end to end, host to host connectivity over a vast network. TCP and UDP.

Interesting times ahead for Terabit scale networks. Keeps the fun alive in network engineering.

References:

https://en.wikipedia.org/wiki/Consensus_dynamics

Click to access EncyclopAI07.final.pdf

 

 

 

Simple NAPALM Use – A Python based abstraction layer multivendor support capable. Part of screen scraping solution.

Ansible ConfigMgmt / Jinja2 Templates – Part of a CLI automation system solution which can be called sophisticated screen scraping. via SSH. No on-device agent or service.

Salt / NAPALM Logs – Event driven network automation. This is also a CLI automation solution and can be part of screen scraping from network device perspective. Event driven by NAPALM logs.

Netconf or Restconf with YANG – Connectivity is via Netconf/Restconf (JSON/XML) while configuration is via YANG data modeling available on device (which is a service on device).  Not screen scraping or CLI automation as YANG is a data modeling language providing service and can be used to extract and push state at device.

SDN based Cisco ACI like: Northbound Rest API on APIC controller and Southbound OpFlex with an OpFlex agent on device. On device Policy Element abstraction service.

Good reference links for exploring of the above:

Ansible + Jinja2 Option:

https://networkotaku.wordpress.com/2017/10/24/network-configuration-templates-with-ansible-and-jinja2/

Salt + NAPALM Abstraction Option:

Click to access 17-RIPE76_-Event-driven-network-automation-and-orchestration.pdf

https://mirceaulinic.net/2017-10-19-event-driven-network-automation/

https://my.ipspace.net/bin/list?id=xNetAut181#SALT

Links for Restconf + YANG Option:

https://networkop.co.uk/blog/2017/02/15/restconf-yang/

https://networkop.co.uk/tags/ansible-yang/

https://packetpushers.net/podcast/pq-show-116-practical-yang-network-automation/

Links for SDN Style Cisco ACI – Like:

https://wiki.opendaylight.org/view/OpFlex:Opflex_Architecture

https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-731302.html

https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/2-x/rest_cfg/2_1_x/b_Cisco_APIC_REST_API_Configuration_Guide/b_Cisco_APIC_REST_API_Configuration_Guide_chapter_01011.html

 

This post is not in English. 🙂

L2, ETH, M ADD, SRC/DST, M TABLE, ARP TABLE, GARP, VLAN, BCAST DOM, MTU, PMTUD. AppPMTUD, JUM FRAMES, 1500, 9000.

STP, ROOT Port, BPDUs, block, listen, learn, forward, disable, RSTP, MSTP, TCN BPDUs,

Swi, Hub, Rtr, Gway,

IP, SUBNET, ETH TY, SRC ADD/DST ADD, TTL, IP Add(Net:Host), CIDR

MPLS, LDP, LSP, LBL, EXP, PHP, FRR

TE, RSVP, LBL ST, LSP. Backup LSP via MPLS TE.

L3VPN, ROUTE, PE-CE , LBL Stack, VPNV4, MPBGP, AFI/SAFI L2VPN, RD,RT, VRF

VLL,L2VPN, VPLS, ELAN, TunnMTU,

OSPF, NBMA, MCAST, AREA, DR, BDR, Areas, LSDB, Rte Sum/Rte Fil b/w areas, Stub (No AS Ext LSA) , TotStub(No Sum, No Ext) , NSSA (Type 7 Ext Transit) , LSA, NET LSA, RTR LSA, EXT LSA, ASBR, ABR, SUMM LSA, ASBR LSA, ASBR SUMM LSA, SPT, HELLO, DEAD TIME, RTR ID,

BGP, TCP 179, TELN 179, PING, FW 179, NHRI, ROUTE, COMMUNITY, B PATH, LNG PREFIX M, WEIGHT, LPEF, ASPATH, IGP/EGP/LOCAL, PREPEND, BlaHol COMM, NHOP, PIC, PREFIX FIL, AS PATH FILT, COMMUN RPL LPREF Actions,

RR, CLUSTER, IBGP, B PATH, vRR

RIB,FIB, AD DIST,

ACL, DENY, ALLOW,

ASIC, CEF, RP, MEM, LC, CBAR FABRIC, LINE RATE, CUT THROUGH.

DC, CLOS, LEAF, SPINE, IGP, IP, LOOPBACK, BGP, ROUTE, VXLAN, MACinUDP, VRF,
MPBGP, EVPN, MAC-ROUTE, MAC VRF, EVI, ESI, SDN, BGP LS, PCEP, BDR LEAF, L3OUT, VRF, MPBGP, PE-CE, DC L2OUT, BD, VLAN, MTenant, BRI DOM, END POINT, 1-MAC/+IPSubnets.

TMETRY, PULL, GRPC L4, SNMP PUSH.

ANSIBLE, NETCONF/YANG, APIs, YAML, WSPACE, HUMreadable, XML/HTML, JSON, JINJA2, PYTHON.

TCP, SYN, ACK, SYN ACK, ECN echo bit, FIN, RST, SL WDOW, PORT, SEQ, P BIT/ImmSend, MSS, RELIA, Ordered.

UDP, CLESS, SPORT, DPORT, Length, checksum. DNS, DHCP, SNMP, Jitter, latency VoIP, unordered, mcast, bcast.

DNS LB, GSS, GLB, ANYCAST GWAY.

DHCP DORA, IP, BCast, Unicast.

DCI, L2/L3, EoMPLSoGREoIP, L2VPN, VPLS, EVPN, MP-BGP VRF.

SAN, SYN/ASYN, latency, DWDM/CWDM, iSCSI, FCoIP.

Workload Mobility, MobIP, LISP, ProxyIP.

QoS, DiffServ, MPLS EXP, E-LSP, L-LSP, per Class DiffServ Aware MPLS-TE vi RSVP Sig.

Linux, Expect, BASH, Python, AWK, SED, GREP, CRON, VIM, NANO.

Scri, D-Types:No, Stri,List (mutable),Tuple (immutable), Dict (key,value), Variables, Arrays(C), List-Stack (LIFO), List-Queue (FIFO).

Cond Prg, If, Ifelse, ElseIf, NestIf, While (Condition True) , For (Iterations known), Break, Continue, For Else, built-in functions, User-def-functions. Library, Framework, local vari, global var.

 

Layer 1, Layer 2, Layer 3 and Layer 4. Physical, Link Layer/MAC Layer, Network Layer, Transport Layer.

Physical is Physics which is improving allowing more bandwidth limits.

Layer 2 sitting atop physical involves Links/Mediums access control mechanisms between devices. It provides bits data transfer over the physical connectivity and involves payloads/addresses.

Layer 3 involves connecting multiple networks and forming an internetwork which further provides network level end-to-end connectivity.

Layer 4 involves end to end host level connectivity.

Within Layer 2 we have software enabled Virtual LANs, we have loop avoidance via Spanning Tree, we have Link aggregation via LACP etc.

Within Layer 3 we have IGP,EGP,VPN,SP,DC,WAN,TE, QoS and what not.

Within Layer 4 we have TCP,UDP; Connection-oriented/Connection-less; Flow-control, windowing; Reliability, acknowledgements, sequencing; Error control, checksum; Port numbers, etc.

Layer 2 and Layer 4 are relatively ‘localized’. Layer 2 due to its physical/link level vicinity and layer 4 due to its in-host & between-host proximity. While there is science in these layers it is somewhat local.

Layer 3 involves much geography. It is the domain which deals with providing end-to-end connectivity spanning much space and area. With this comes much management. It entails reachability across multiplexed systems via addressing, reliability via multi-pathing, reachability status communication, path preferencing in multipath options, path avoidance, virtual privacy and isolation across multiplexed systems, time management for timely fault tolerance and fault bypass. geolocation based path selection and load balancing. etc. etc.

Hence the birth of large-scale Internetworking Protocols.

Protocols which are engineered to have some have mechanisms built-in & agreed upon while some options require configurations.

Autonomous Networks which have all the mechanisms built-in and require no configurations are not present at the moment, except perhaps somewhere inside Google et al.

For now we have to sift and select between options and configurations for making data flow.

An automated multi-tenant data center network is an increasingly desired end goal for large and small organizations including providers. Servers that house the CPU, RAM and Hard Disk resources are serving traffic for applications they host. These servers need connectivity among themselves within the data center and also towards the outside world.

At first an organized set of CPU/RAM/HD Servers are connected to a network device. This happens in a Data Center rack and the network device is a ToR, a Top of Rack switch.  Another similar set of servers is connected to another Top of Rack network device. Multiple such sets of servers/network device pods are then linked together. The incumbent way to do this would be to make a leaf-spine Clos fabric. The layer of network devices connecting the servers are the leaf layer and the layer of network devices that is connecting these leaf nodes is the spine layer.

Hardware is thus laid out in a 2-stage or 3-stage Clos fabric and then we need to lay out a logical control plane to pass traffic. Applications on the Server CPU/RAM/HD will talk to each other within the DC which is east-west traffic or to the outside world which can be called north-south traffic.

Depending on the type of application east west traffic could be higher but north south traffic is always present.

Moving bits from a server to any other location is the networks job. These bits could be a compute hosting virtual machine’s bits or a ‘Serverless’ cloud application’s bits but they  go somewhere and are moving. They are moved by the network layer regardless of what resides on the servers.

How many layers of protocols and software are required to provide for an automated multi-tenant data center network which can connect servers, host applications and provide east-west/north-south connectivity ?

In the Networking Components blog post some basic networking components were listed out in a different construct: Network Device, Protocols, Protocol Messages, Addresses, Lookup tasks, Identity Tags, Filters & Actions, Network Over Network ( Overlay) Appended Information, Network + Network , Network Inside Network Device, Control and Data Plane.

In the Event-Driven Network Automation blog automation details were described.

The below will make some use of the networking components and event-driven network automation blog posts.

At first you need Addresses appended onto payload bits to ascertain endpoints and exchange traffic. How many layers of addresses will be required to connect the servers to each other over a fabric? In a full mesh structure the networking layer is small/direct and less addresses are required. In a Clos Leaf-Spine-Leaf fabric there needs to be multiple layers of addresses required.

A packet/frame structured bits data structure is switched across multiple nodes. In terms of Addresses Ethernet MACs are used for Layer 2 connectivity between servers NICs and ToR ports. The server could also have an IP Address of its own and be performing Layer 3 communications.

One server connected with one leaf could send an IP packet to another server connected with another leaf (Server<>Leaf<>Spine<>Leaf<>Server). As parts of the Control Plane of laying out the fabric the leaf and spine network devices will have IP addresses of their own which will speak to each other and send Control Plane Protocol Messages. What this infers is that there will be present 2 layers of IP communications. One between the network nodes themselves and one between the servers. This infers the requirement to have an IP address pushed on to another IP address in a tunnel type structure where from one network device to another (e.g. leaf to leaf via spine) the packet is routed based on Outer IP Addresses and the inner address is used by the server. Therefore some packets will require an addressing structure such as IP|Eth|IP|Eth. The IP Tunnel will span from a Leaf to another Leaf via the Spine, therefore the tunnel endpoints are at the Leaf switches. (Server-IP<encapsulation>Leaf-IP<>via Spine <>Leaf-IP<decapsulation>Server-IP)

We have multiple combinations of communications to deal with in multiple layers of the networking stack.  Leaf-Local L2, Leaf-Local L3, Leaf-Spine, Leaf-Spine-Leaf L2, Leaf-Spine-Leaf L3.  All this calls for multiple domains. A ‘Local’ Link Layer Domain, A Local Network Layer Domain, A Distant Network Layer Domain, A relatively distant Link Layer Domain. A link layer domain could be an L2 VLAN/broadcast domain or a bridge domain and a network layer domain could be a local VRF or a wider-spanning IP-in-IP domain level routing instance.

… A routed layer has IP addresses at two endpoints and an Ethernet link has MAC addresses at two end points. A virtual machine of a tenant in a server can have both an IP address and a MAC address. There could also be a single virtual machine having multiple subnets IPs behind the same MAC address ethernet link. This virtual machine is an endpoint and is this is what the network layer needs to provide connectivity to. Therefore we could say that an endpoint requires at least 2 tables at the network device it is connecting to. An IP Routing table and a MAC table. An ARP table is also required for Inter-Layer discovery. There is also the Leaf-Spine-Leaf IP-in-IP tunnel we spoke about which adds another layer of overlay Routing Table. In addition an outer IP to inner IP socket-style mapping function will be required which is another table (L4 Socket of Outer IP to Inner IP).

Discovering the places of destination-address lookup-actions happening in a network always helps discover the kind of networking happening.

So a Leaf-Local L2 frame (a server sends to another server connected to the same leaf) would be switched locally with the local bridge domain/mac table. A Leaf-Local L3 packet would be routed by the local VRF. A Leaf-Spine-Leaf Packet would be mapped to the relevant far-end leaf tunnel endpoint and a tunnel endpoint IP would be pushed on it; it would then be tunneled/IP routed across the spine to the destination leaf; the destination leaf would then look at the socket-style mapping table of the destination endpoint; it would then pass onto the final destination endpoint.

While the Leaf-Local communications can be handled within the network device by tables, mappings and local lookups, it is obvious that when crossing the spines and reaching for a far end leaf there is a need for a control plane to communicate the far end addresses and mappings. A Protocol to exchange the distant leafs addresses and mappings which establishes the control plane for traffic to be switched and routed between leafs across the spines. There is a spine in the middle and the leafs are not directly connected. A Control Plane to distribute addresses and mappings.

There is a choice here.

For this Leaf-Spine-Leaf addresses exchange & inner/outer mappings population we could use a distributed, nuke-tolerant, internet style packet layer protocol OR instead use an SDN style central controller to do the thinking and push/program the network devices with all the addresses and mappings. The devices need to be populated with far end addresses and mappings and both will achieve this goal.

Our topic is an Automated Multi-Tenant Data Center Network and the automation part of the name is supported by the SDN style.

Why ?

The reason is that any distributed, nuke-tolerant, internet style protocol inherently requires independent configurations on all networking nodes which then enable the devices to start communicating. While an SDN controller is a single configurations point which pushes the configs onto the devices. This means that from an automation standpoint you will be either automating the configurations of hundreds of devices or an SDN controller. Configuring all devices in a large data center fabric independently is difficult to automate while managing the automation of an SDN controller or even levels of SDN controllers is easier.

This Data Center will need to speak to the outside world too.

This means that there will be a border functionality which will provide L3 and L2 reachability to the outside world. i.e. Ethernet L2 connectivity; VLANS or bridge domains, extended from a server in a leaf to a border node leaf and onwards to an outside world L2 construct, say an MPLS L2VPN.

Similarly an L3 VRF extension  where a set of routes of an endpoint/server/tenant are stretched onto a border node leaf’s VRF via say MP-BGP style RD/RT mechanism where they are further extended onto an outside world MPLS L3VPN via a PE-CE routing protocol. (Tenant-Routes|VRF|MP-BGP|VRF| VRF-PE <> CE|Outside World).

Our topic also contains the word Multi-Tenant which means that in the case of L3 Multi-tenancy a legacy MPLS L3VPN style VRF/MP-BGP mechanism will be needed per tenant per VRF.

A similar mechanism is required for connecting two or multiple such large data centers between themselves. So “Endpoint<>Leaf<>Spine<>Border-Leaf<>|Infra-Link|<>Border-Leaf<>Spine<>Leaf<>Endpoint” communications are then possible. For L3 VRFs/MP-BGP can provide separation and ensure multi-tenancy for this Inter-DC comms.

When required some border leafs will obviously connect to routers which speak eBGP to the outside world. Other Autonomous Systems over Transit and Peering connections. These routers will have the global routing table and will gateways to the rest of the world. The PE-CE communications mentioned above for VRF stretching can be Static/OSPF/EIGRP/BGP.

To run this Automated Multi-Tenant Data Center lets not forget an overarching Orchestration software residing atop it providing a GUI mechanism into the wide array of options, tools, clogs and combinations to enable tenants Intra-DC, Inter-DC and outside-world communications.

Enabling application endpoints to communicate via a network requires a whole bunch of protocols in the networking layer. Different protocols providing different functionality each providing a brick making a wall which is achieving the end goal of endpoints communication.

Physically after transceivers have delivered ordered bits in a memory location in a network device they are digested. It could be any of a number of control plane or data plane datagrams that the network device needs to digest.

It could be a Layer 2 MAC / Ethernet layer frame aimed at information transfer within the local area. It could be an ARP control plane frame. It could be IP address reachability information like an OSPF or IS-IS control plane packet. It could be a TCP handshake packet or a TCP Payload packet. It could be a UDP packet. It could be a BGP Update message providing next layer (IP) reachability information.  It could an MPLS labeled packet being switched across through an IP core network.

It depends.

Regarding Reachability Wikipedia states:

” In graph theory, reachability refers to the ability to get from one vertex to another within a graph. A vertex s can reach a vertex t t (and t is reachable from s) if there exists a sequence of adjacent vertices (i.e. a path) which starts with s and ends with  t.

In an undirected graph, reachability between all pairs of vertices can be determined by identifying the connected components of the graph. Any pair of vertices in such a graph can reach each other if and only if they belong to the same connected component. The connected components of an undirected graph can be identified in linear time. The remainder of this article focuses on the more difficult problem of determining pairwise reachability in a directed graph. ”

It’s interesting that mathematically a network is a Graph and a networking device is a Vertex but we’re blogging on networks and not on math.

BGP

BGP neighbors are manually configured to utilize a TCP connection at port 179 to exchange IP address routing information. This is the most common use on the wider Internet where transit providers use BGP to exchange IP routes of connected networks. A large service provider which sells internet transit uses BGP to peer with similar other service provider networks and with server hosting providers.  BGP can also be leveraged to advertise information other than IP e.g. MAC routes in EVPN.

Practically speaking any two routers with an established BGP connection send update messages to add and withdraw IP Prefixes (routes) and the routes attributes (AS Path, Community etc.).

BGP has a full finite state machine diagram where a session transitions from Idle state to Established state. Initially Idle it transitions to Connect, OpenSent when Open message is sent, Active state, OpenConfirm where both sides have sent Open message and then Establised where a final acceptance Notification message is sent and thereafter keepalive messages are exchanged. In the OpenConfirm state the two BGP ends have both sent Open messages to each other and are checking the information to see if a BGP session with this peer should be established. The primary information in the Open message include the BGP version number, the AS number, the hold timer, the bgp router id and the optional parameters.  The optional parameters contain TLVs which negotiate attributes such as MP-BGP extension to be used between the peers.

Once established Update message is sent with the routing information and route attributes. Every Update message causes the BGP route table to update and route table version number to increment. An update message contains unfeasible routes, path attributes and NLRI which are IP routes. Path attributes such as AS_Path, LocalPref and MED are present in the Update message.

iBGP as opposed to eBGP is used to communicate routes with an Autonomous System. The AS_Path is treated different in the case of iBGP where a router only adds its own AS number in the path if its speaking to an eBGP peer and does not add its own AS number if its speaking to an iBGP peer. Otherwise if the BGP process sees its own AS it would drop the route assuming a loop. Either a full mesh is required for iBGP so that every router knows every destination of a Route Reflectors could be used to peer with iBGP speaker and reflect routes. Routes received from a client in an RR setup are reflected to other clients and non client neighbors.

One of the mechanisms in BGP is the best path selection methodology. If an IP prefix is reachable from multiple paths BGP has a list of if else steps through which it transitions to select one best path and advertise that.

The best path selection criteria are given below.
1) Weight (Cisco locally assigned – higher weight preferred)
2) Local Preference – Prefer path with higher local pref
3) Network or Aggregate (Cisco local route vs aggregate route)
4) Shortest AS_PATH  (Prefer path with shorter as path)
5) Lowest origin type IGP < EBGP
6) Lowest multi-exit discriminator
7) eBGP over iBGP
8) Lowest IGP metric
9) …

Another aspect of BGP is the route filtering and route manipulations via Community attributes. Where a community attribute is sent in a numbered format e.g. 6939:400 to trigger an impact on the far end neighbor path selection. For example if one neighbor send 6939:400 community to another neighbor the receiving side will set Local Pref of the route to 400 based on a previously agreed upon understanding. This is achieved by if-then-else route policies are the receiver end.  Commonly used communities include Local_Pref setting communities and blackhole communities.

Another aspect of BGP is Multihoming and traffic load balancing. If one autonomous system is multihomed to another autonomous system it will use LPref, Communities and AS Path prepending to influence traffic.

BGP has also been used as an IGP alternative is Massive Scale Data Center deployments using Clos fabrics.

BGP is flexible, scalable, stable and reliable but it is slow in convergence, has limitation is terms of load balancing and requires large CPU/TCAM in case of large routing table sizes.

 

References

https://learningnetwork.cisco.com/blogs/vip-perspectives/2017/12/14/demystifying-bgp-session-establishment

https://www.inetdaemon.com/tutorials/internet/ip/routing/bgp/operation/messages/update/

Click to access BRKRST-3320.pdf

https://blog.ipspace.net/2017/11/bgp-as-better-igp-when-and-where.html

http://huzeifabhai.blogspot.com/2011/08/eigrp-ospf-bgp-strengths-weakness.html

 

 

 

 

Event Driven Network Automation is a term used to describe what large scale NetOps teams are doing to scale, deploy and manage networking infrastructure.

YAML data formatting and Jinja2 templating with Python glueing and executing.

Ansible/YAML and Netconf/API for configuration, execution operations.

Event Generation using SNMP/Telemetry/BGPMon.

BGPMon looks like it could be used to check up on changes in a Routed Core with BGP based Leaf-Spine Clos Fabric.

Zero Touch Provisioning – ZTP is best suited for quickly bringing up new devices.

An Orchestration-style GUI layer custom made for every domain in the network would definitely be required as well for various aspects of NetOps.

There can be human driven network automation but there can also be event driven network automation which can be termed as ‘closed loop’ with rule based actions defined by humans.

The events driven, closed loop, rule-based-actions execution layer would then be managed by humans. This layer would be evolving and to manage it there would be a requirement of necessary data structuring and scripting skills in addition to being mindful of what the impact is on the network layer (DC or WAN, both).

References:

https://mirceaulinic.net/2017-10-19-event-driven-network-automation/

Click to access 17-RIPE76_-Event-driven-network-automation-and-orchestration.pdf

Network Automation: Template Configurations with Jinja2 and YAML

https://packetpushers.net/back-journey-network-automation-introduction/

https://packetpushers.net/back-journey-network-automation-part-1-zero-touch-provisioning/

https://packetpushers.net/back-journey-network-automation-part-2-ansible/

https://www.ipspace.net/Building_Network_Automation_Solutions

I attended the Amazon Network Development Engineer tech talk held in Sydney yesterday. While fishing for future Network Development Engineers Amazon gave a short presentation on their network from a DC and DCI/WAN perspective.

It was a good talk and the interaction with the Network Development Engineers afterwards was insightful. A lot of their work is circling around Automation and Scripting. This is also obvious from the Job title and the Job Descriptions for the role advertisements.

The difference between the shortest path in a network and the path that traffic between two points actually takes is defined as network stretch. It can also refer to the difference between the shortest physical path and the shortest logical path a packet being forwarded must travel. There could be a difference between the shortest physical path and logical path if the link cost between a set of hops is higher resulting in logical path being different.

Network stretch can be calculated based on comparing hop counts through a network, the metric along two paths and/or the delay along two paths among other things.

Stretch is not always bad and increasing stretch via Policy Based Routing or Traffic Engineering to push traffic off the shortest physical path onto a desired logical path is a desired outcome.  In this case the post TE network stretch, the difference between the physical path and the logical path, is desirable, required and is a policy decision.

Defining and calculating network stretch can aid in finding the complexity of a network.

References: Computer Networking Problems and Solutions (2017)

 

 

To gain an understanding of components that make up networks we’ll start by stating that a network is a combination of tools working together to provide connectivity to endpoints.

Let’s list the tools.

Network Device (Switch and Router and others) – This is a device which terminates multiple cables into itself with the other end of the cable being other devices. The network device interconnects multiple endpoints via its ports on which cables terminate.

Protocols – These are tools which provide for a coordination mechanism. This coordination mechanism is an exchange of information which makes possible the exchange of traffic.

Protocol Messages – These are messages exchanged between Protocols while they coordinate the laying of the network foundations for exchange of traffic.

Addresses – These come in many flavours and are intended to identify the source and destination of a data payload which is traversing the network.  They can be layered/structured for aggregation and division pools.

Lookup – This is done on the various addresses to find the next hop. Lookup is done to find the next point to which to send the data payload to so that it reaches its ultimate destination after traversing the network.

Appended Information – This is a general term which encompasses information traversing the network which is other than payload and addresses. These are information and tools which are put into packets for protocol operations. This is information inside headers other than the addresses.

Identity Tags – This is a specific class of Appended Information which provides for identity functionality during a lookup and for identification and separation of protocol functions.

Filters & Actions – These are deployed on the network devices to provide intelligent selection and resulting actions over the traversing data payload. They utilize the addresses and appended information inside the data payloads and also the headers.

Network Over Network – This is a general term for a network on top of a network for provision of separate connectivity. A combination another layer of protocols and addresses result in a network over a network.

Network + Network – This is a term identifying the interconnection of 2 or more separate networks resulting in a larger network. Also called internetwork it signifies one domain interconnected to another domain.

Control and Data Plane – Control Plane is the network protocols laying the network foundations and data plane is the traffic traversing the network. Control Plane enables Data Plane.

Network Inside Network Device – This is a term signifying the division of a network device to facilitate a software separation in networks. It creates separate networks inside a network device via operating system software constructs.

We can put brands on these:

OSPF/ISIS/BGP are Protocols to lay the Control Plane for IP addresses

LDP is the Protocol to lay the control plane for MPLS addresses (labels)

MAC Address / IP Address / MPLS Labels are addresses and Lookups are done on them during Data Plane operation

MPLS L2 VPN / MPLS L3 VPN are a Network Over Network function based on labels.

MP-BGP is a protocol to lay Control Plane for Network over Network (MPLS L2VPN & EVPN)

AS to AS BGP connectivity is a Network + Network function

Route Maps / Prefix Lists / AS Path Lists are part of Filters and Actions

OSPF Areas and ISIS Levels are a Network domain + Network domain layering type function

QoS Diffserv and CoS are appended information for actions and functionalities

EVPN, OTV & VXLAN are Network over a Network options. These provide a network over a network Control Plane and network over a network Data Plane.

VXLAN VNID / VLAN TAG / Route Target / Route Distinguishers / BGP Communities are Identity tags for protocol operations where they aid the control plane or data plane.

VDC / VRF / EVPN EVI are Network inside Network Device features primarily being operating system software constructs.

This is a rough approach with much simplification but is intended to view the various network components as tools providing functionality working in unison for connectivity provision. This view aids looking at the components from a Design perspective.

Whether it is a Service Provider, Enterprise or Data Center / Cloud IaaS network the components interact and provide functionality.

This posts focuses on the trend of Microservices and the various related terminologies and trends. In the end it lists the brands in their categories.

An application is software. It is composed of different components. These are the application components. Together they make up the application. The difference between one application software component and another application software component is one of separation of concerns. This is simply dividing a computer program (the application) into different sections. If the different components are somewhat independent of each other they are termed loosely coupled.

The different components of an application communicate with each other. When they need to interact with each other they do it via interfaces. A client component does not need to know the inner workings of the other application software component and uses only the interface.

This is where the word service comes into play where what one application software component provides to another software component is called a service.

Now this application may be placed on a distributed system where its different components are located on networked computers. Thereafter in terms of an application running on a distributed system, SOA or Service Oriented Architecture is where services are provided to other software components over a communications protocol over a network.  This is due to the underlying hardware being networked and distributed in nature and the application software on them being distributed on it.

In terminology of Distributed Systems when when one of its components communicates with another component they do this via messages. We can say that in a distributed system, an application’s software component sends a message to another software component to utilise its service via an interface and that interface is also utilising a network protocol.

We now know about an Application which is a software program, its components and that services are provided by its components. We now know about Distributed Systems, its components networked together and messages being passed between them over a network. We know about applications running on distributed systems where application software components are running on components of the distributed system. We know the application software components communicate with each other via a network.

In Microservices a distributed systems component is running an applications software component and is providing a service. It’s a process now in execution mode. So one software component is placed and is running on one distributed system component and is providing a service from there to other similar independent components.

A normal process is a running software program in execution mode. Inter Process communications are IPCs in terms of processes. In Microservices IPCs will be network messages.

What we discussed above earlier is the application software architecture and its transition into the distributed systems environment. When you say that each independent software component is now running, is a process, it is running on a distributed systems components and the Inter Process Communications are over a network you have Microservices. These Microservices form an Application.

Furthermore, in Microservices there is a bare minimum of centralized management of different services and they may be written in different programming languages and use different data storage technologies. So we can have one software component written in Go, and another in NodeJS and they will provide each other services. These services will also be over a network. So a Go software component can be running on one distributed system component and a NodeJS software component can be running on another distributed system component and they will interact via the network composing the distributed system. Multiple such distributed software components providing services to each other make up a Microservices Application.

A container provides an environment to run a microservice component. A container is a distributed system object which can be termed loosely as a distributed system hardware+software components service.

In terms of branding:

Amazon AWS is a Distributed Systems Provider.

EC2 is Amazon AWS’s product to provide a distributed system compute component online.

S3 is Simple Storage Service, a product for simple storage of files by Amazon AWS online.

DynamoDB is Amazon AWS’s NoSQL Database product which available as a product online.

Golang and NodeJS are programming languages in which backend server side software components are written.

React is a programming language in which frontend user side application software components are written.

Docker is a software which provides for individual container management. One container provide the environment where a software component can be executed on a distributed system.

Kubernetes and Docker Swarm manages multiple (lots of) containers deployed on distributed systems for running a distributed application. They are for containers management.

RabbitMQ and Kafka work as message brokers for passing messages between microservices

RESTFul HTTP APIs are also a means for intermicroservice communication.

Protocol Buffers and GRPC are means of faster intermicroservice communication messaging.

MongoDB and Couchbase are NoSQL databases which can be run in containers and be utilised by application software components for Database purposes.

Git is an application software component version control system

Promethues is an application (software) to be run (can be in containers) built specifically for the purpose of monitoring microservices software component health (metrics)

Grafana is an application (software) to be run (can be in containers) for the purpose visualizing metrics/health of microservices.

ELK stack which is ElasticSearch, Logstash and Kibana are softwares which provide for logging of events and their search and visualization.

https://en.wikipedia.org/wiki/Component-based_software_engineering

https://en.wikipedia.org/wiki/Event-driven_architecture

https://en.wikipedia.org/wiki/Service-oriented_architecture

http://www.d-net.research-infrastructures.eu/node/34

https://martinfowler.com/articles/microservices.html

https://en.wikipedia.org/wiki/Process_(computing)

https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45406.pdf

 

Similar trends in multiple industries are apparent.

  • Telecommunications Provider e.g. AT&T
  • Networking/Internet IP,MPLS Service Providers
  • Cloud Native Iaas, PaaS, SaaS industry

Let’s see what the trend is that they have in common:

  • Telecom – AT&T ONAP’s DCAE – Data Collection, Analytics and Events
  • Networking/Internet Service Provider – Cisco/Juniper Telemetry
  • Cloud Native – Kafka and streaming events data from Microservices architectures

What they have in common is events & data production and thereafter streaming of the data and thereafter analytics on these events/data resulting in near real time decision making.

The naming is different, the products are different and the industries are different but the production of data, its streaming and analytics is common.

Telecom industry is moving from PNF (Physical Network Functions) to VNF (Virtual Network Functions). Which is a move from tightly coupled hardware/software devices to a more software driven architecture

The ISP industry is still shifting around IP Packets but they are now looking for more streaming style analytics of their devices and the traffic flows which they are calling Telemetry.

The Cloud Native industry is in the pack with its Microservices based software centric application architectures.

They all have event generation in common and want to process the data and then use it in real time. Real Time Data Streaming and Processing.

Lets now see dig a little deeper and start correlating the terminologies. From the products category we will take Telecom’s ONAP, Networking’s Cisco DNA, Cloud Native’s Prometheus and Kafka and Information Security’s Splunk from the industries to analyse.

ONAPs VNF Event Stream or VES is the stream event producer. ONAPs logging section utilises the same ELK, Elastricsearch Logstash and Kibana dashboarding available in AWS cloud.

Juniper’s Telemetry streaming utilises Google’s Protocol Buffers (gpb) structured messages are relayed to a performance management application. Cisco’s Model Driven Telemetry utilises the same Google Protocol Buffers for streaming data from its devices.

Cloud Native applications are Microservices based which has Event Sourcing and CQRS and are requiring Rabbit MQ/Kafka style message brokers in addition to stream processors and analytics such as the same ELK stack mentioned earlier.

Large organisations such as Linkedin faced the problem of data deluge earlier than the rest of the world in terms of its handling, processing and analytics in real time. This has resulted in products such as Kafka.

 

 

Link:

https://wiki.opnfv.org/display/PROJ/VNF+Event+Stream

https://wiki.onap.org/display/DW/Logging+User+Guide

 

 

What does a Site Reliability Engineer do?

 

Site Reliability Engineer is a term for the operations and administration of complex computer systems involving:

 

  • Networking,
  • Virtualized operating system environments including VM/Containers,
  • The orchestrations tools for Networking/Virtualized infrastructure,
  • Applications,
  • The interactions of the above in a multi-site/multi-pop environment,
  • And utilising the above to deliver a service/product and ensuring it is working well.

 

It basically appears to be an operations role but within a complex environment where multiple technology silos interact heavily to deliver the product to the end user. Google hires and assigns the role of Site Reliability Engineer and they are operating such a complex environment delivering Google.com/Gmail.com/Youtube.com etc. Facebook does the same.

 

Looking at a particular set of Site Reliability Engineer job advertisements they appear to have one thing in common for diverse roles within the SRE domain:

 

  • ‘You have an ‘infrastructure as code’ approach to managing infrastructure’

 

So from the Site Reliability Engineer title we reach to the term Infrastructure as a Code approach.

 

Infrastructure as a Code ‘tools’ sort of sit on top of Configurations Management tools like Puppet, Chef, Ansible and provide increased functionality. Terraform and AWS Cloudformation are two Infrastructure as a Code tools but what the Job Ads are asking for is it approach.

 

Coding is common:

  • One thing that is apparent is that when you take a look at an Ansible Playbook YAML .yml file or a Teraform Configuration .tf file or a Chef Recipe .rb file or a Puppet Manifest .pp file or a AWS Cloudformation Template they all look code-like. In fact, they are all code but at a plane where the code is not intended to utilize processor, memory and hard disk of a single machine in a setup.exe resulting file-form to deploy on a single computer. They are coded or code-like data expressions which translate into the deployment, configuration and orchestration of more complicated computing systems. They are code-like expression which for example deploy AWS products which are themselves ‘infrastructure as a service’ public cloud systems.

 

From this it appears that Infrastructure as a Code is a term signifying another layer of abstraction.

There are levels of abstraction. Where the levels can be :

  • solid state physics, silicon/CPU, memory, hard disk hardware.
  • then 0’s and 1’s & bits on top of these at the next level
  • then integers & strings utilizing the above layer
  • then arithmetic operations and string manipulations on top of above
  • then programs and software applications running on computers/devices,
  • then interconnected computers
  • then distributed systems composed of interconnected computers/devices
  • then Public Cloud Infrastructure as a Service
  • then an Application running on this silicon,cpu,memory,hardisk,0’s,1’s, bits, integers, strings, arithmetic operations, string manipulations, individual programs/software, interconnected infrastructure/computers/servers/routers/switches, public cloud.

By inference Infrastructure as a Code approach presents/preserves information that is relevant to our end application plane and environment and abstracts information that is not relevant in our Application’s environment.

The internet is a good examples of multiple systems on top of other systems.

It looks like even Google and Facebook still need a human operate their systems. They will not be flying through them like an aeroplane in clouds. They will know the layers and the systems in place. They will navigate from symptoms to root cause and then codify/rectify & adjust for continual optimal service.

Moving on, three job ads for Site Reliability Engineer are given below.

Infrastructure as Code is common. Lets see the rest.

Job Ad:

  1. Site Reliability Engineer | Data Stores | Redis & Kafka

 

Our Tech Stack across Site Reliability as a whole:

• Data Analytics software including Kafka and Redis
• Open Source technologies (We constantly look to innovate and adopt)
• Amazon Web Services – AWS, and a load of services
• Coding with React, NodeJS and Python
• Couchbase, Kubernetes, ElasticSearch & Microservices Infrastructure
• Linux Operating systems, we look for passion
• Infrastructure as Code & Automate everything are a couple of our mottos

Job Ad:

  1. Site Reliability Engineers | Multiple Roles | Golang | AWS | ReactThe TechStack you will be getting your hands dirty with:

    • Open Source technologies (We constantly look to innovate and adopt)
    • Amazon Web Services – AWS, and a load of services
    • Coding with React, NodeJS and Python
    • Couchbase, Kubernetes, ElasticSearch & Microservices Infrastructure
    • Linux Operating systems, we look for passion
    • Infrastructure as Code & Automate everything are a couple of our mottos

 

Job Ad:

  1. Site Reliability Engineer | Edge Computing | AWS | Networking

 

Our Tech Stack across Site Reliability as a whole:

• Networking – Load balancers, Proxies, Routing, DC, AWS
• Open Source technologies (We constantly look to innovate and adopt)
• Amazon Web Services – AWS, and a load of services
• Coding with React, NodeJS and Python
• Couchbase, Kubernetes, ElasticSearch & Microservices Infrastructure
• Linux Operating systems, we look for passion
• Infrastructure as Code & Automate everything are a couple of our mottos

 

The three SRE roles are diverse and they are geared towards multiple parts of the stack which run the end application. SRE tilting towards Networking, SRE tilting towards Data/Stream Processing and SRE tilting towards Development (Front-End/Back-End).

 

The below are common to all three:

 

  • React, NodeJS and Python
  • Couchbase, Kubernetes, ElasticSearch & Microservices Infrastructure
  • Linux/AWS

 

The below varies amongst them:

 

  • SRE Networking tilted role – Edge Computing, Load balancers, Proxies, Routing, DC, AWS
  • SRE Data/Stream Processing tilted role – Kafka, Redis
  • SRE Dev tilted role – Golang

 

And so what is this system achieving and what is it composed of? How do they interact and what do the multiple SREs do?

 

In terms of programming languages, we have Golang, React, NodeJS and Python.

In terms hardware we have AWS and Edge Computing PoPs/nodes/devices

In terms of data store and streaming we have Kafka and Redis

In terms of containers management there is Kubernetes

In terms of Data retrievals / search and possibly analytics there is ElasticSearch

In terms database there is Couchbase

 

An SRE is not an end-application software developer. So the above listed tools are part of the system to be run. This will be done with infrastructure as a code approach to programify for optimal operations.

 

So lets now try to put the clues in the Job Description together.

 

  • React & NodeJS are Javascript frameworks with React being the User Interface/FrontEnd (used by Facebook UI) and NodeJS being the Server/BackendEnd for Scalable Data I/O. Python can be used as for programming services at various locations. Golang is also used in the the Backend Serverside providing for its concurrency feature for applications/services.
  • Redis can be used to store application state information. In-memory fast, scalable and distributed. It is a key value store provider for application state cache-like.
  • Kafka is a distributed data streaming platform and can be stated to be in the middle. Producers producing data and consumers using data and stream processors processing it are connected to Kafka clusters. It can be used for event streaming/aggregation.
  • With no other stream processing engine present in the Job description Kafka with Kafka Streams can be stated to provide for stream processing as well.
  • ElasticSearch can be used for indexing and search. Data can be copied in via Kafka connector APIs and then indexed. Kibana is not listed but it might have skipped mentioning and can be used for the visualization and dashboarding.
  • Couchbase can be used as a NoSQL JSON-style distributed database as an external store for storage of logs/events (documents). It can take in data and deliver it via its Kafka connector.
  • Kubernetes manages the containers furbishing the application environment.

 

It looks to be a full Cloud Native environment which needs to be kept up and running optimally with continued service.

 

Part of this environment is the networking aspect.  This includes the listed edge computing component which means this high performance cloud native application also has near-user-location edge devices within its architecture.

 

Geolocation Routing and CDNs are the tools used to decrease application latency times. AWS Availability Zones can be considered as multi-site replicated PoPs. Edge networking nodes will also branch off as required and can be mini PoPs. Depending on the size of the user base being serviced by the Edge PoP node it might scale into being a small DC.

 

Branching within the networking domain is the use of Proxies. Forward + Reverse + Side-car if required.

 

A scenario of an increase in application demand resulting in container scaling which can result in requirement of on demand load balancing and proxying. One such tool is the F5 Application Services Proxy which from a networking perspective is a proxy but it integrates with the Kubernetes and can be used for an infrastructure as a coded deployment. F5’s Application Services Proxy is itself a Node.js application but is middleware here.