Archive

Monthly Archives: December 2018

This is a copy of a previous Linkedin Post Dated June 7 2016 which was not present on this Blog.

https://www.linkedin.com/pulse/opnfv-brahmaputra-systems-integration-nfv-vnfs-lutfullah-kakakhel/

OPNFV Brahmaputra is a Lab ready release of OPNFV. One statement is that community driven Systems Integration really is a hard task to accomplish. This becomes especially true if the systems being integrated to form a larger system are actually multiple large open source projects themselves.

To start with OPNFV aims to integrate systems upon which VNFs can be run.

The caption above is heavy. On the one side there is the requirements generating standards bodies block of organizations which produce specifications and define how the system is to run. On the other side there are the code producing development projects which produce open source projects. OPNFV stands in the middle and intends to integrate these individual code projects according to the requirements laid out by the standard bodied and provide a system on top of which VNFs can be run and tested. The reason this task is being run under an umbrella membership based organization such as OPNFV is because it is a repetitive task which every organization will need to do over and over again as soon as new releases of codes are made available for the individual projects.

It might be difficult to picture this to start with but imagine you want to have a lab ready to run and test VNFs. What is the lab composed of? It will have Infrastructure on top of which VNFs can be run. What is this Infrastructure composed of? This Infrastructure will be composed of hardware and a virtualisation layer and hypervisors and networking projects such as OpenDaylight and Openstack and KVM and Ceph all running together to provide a block of Infrastructure virtual compute network and storage (An NFVI Point of Presence) on which VNFs can be run.

Every organization which wants to reach the level of testing VNFs will need such a lab. And then what happens when a new version of OpenDaylight is released or a new version of Openstack is released or KVM or Ceph? Everybody needs to update their labs. OPNFV is a Linux Foundation project which intends to be the focal point of these activities and perform them jointly instead of everybody doing them individually.

It also helps make the system work. A patch to OpenDaylight could work well within OpenDaylight but could break things at System layer when integrating with the rest of the components which make an NFV lab (to be used to runs VNFs). OPNFV aims to be the first systems layer at which point such patches can be spotlighted and returned to the project they came from informing them that at the system levels things get disjointed.

OPNFV according to its initial white paper aims to make this systems testing environment in line with the NFV Architecture References points of Vi-Ha, Vn-Nf, Nf-Vi, Vi-VNFM & Or-Vi.

After the above is clear the figure below can be understood to be a larger system composed of individual projects integrated together with the aim of running VNFs. In the figure below OpenDaylight is one piece (in network), KVM is another piece (in compute), Openvswitch is another piece and Openstack is also one piece. All these when put together provide the infrastructure to run VNFs. Also to be noted is that in the case of OPNFV there are community labs (Pharos Labs) which provide the hardware.

The presence of this combined effort also means that for Network Operators the differentiation in the market is in Service Orchestration. The Virtual Network Functions and the Network Services run on top of them.

 

References:

http://www.etsi.org/technologies-clusters/technologies/nfv

http://www.slideshare.net/CiscoDevNet/devnet-1162-opnfv-the-foundation-for-running-your-virtual-network-functions

https://www.opnfv.org/brahmaputra

http://www.slideshare.net/OPNFV/opnf-brahmaputra-an-early-look

https://www.opnfv.org/sites/opnfv/files/pages/files/opnfv_whitepaper_092914.pdf

https://www.youtube.com/watch?v=Dh55McgHGQ8

This is a copy of a previous Linkedin Post Dated June 7 2016 which was not present on this Blog.

https://www.linkedin.com/pulse/nfv-mwc-2016-syed-habib-lutfullah-kakakhel/

ETSI showcased a practical implementation of NFV at the Mobile World Congress 2016. They showed the whole NFV Architecture being implemented and run to provide a SIP voice call. An end to end communication service of a SIP call was made based on a vIMS platform. This vIMS is an NFV VNF orchestrated by a NFV Orchestrator run on top of Infrastructure controlled by an Openstack based VIM. Let’s see the components and how they made the NFV based SIP voice call.

There are two NFVI PoPs (Points of Presence) or two VIMs. One is Openstack controlled and the other is controlled by openvim (part of OpenMano package). Both are controlled by the Open Mano NFVO for resource orchestration. The Service Orchestration is performed by Ubuntu’s Juju.  The launchpad of Rift.io is used as triggering mechanism for resource orchestration and service orchestration. 6wind provides the PEs showcasing corporate VPN interconnectivity. Telefonica provides the traffic generator to test the bandwidth capacity of the PE links and Metaswitch provides the VNF vIMS Clearwater for being run atop the infrastructure.

The figure below shows details:

A multi-site corporation’s network is shown to be running connected via 3 PEs. One site which is connecting to PE 3 has the VNF deployed in VIM2 which is another Data Center. One NFVI PoP labelled VIM 1 is hosting the 6wind PEs while the second NFVI labelled VIM 2 is hosting the VNF. There is interDC communications going on between the two NFVI PoPs. The figure below shows the SIP voice calls communication logical path. The IMS protocols SIP signaling is implemented in VIM 2 in the Metaswitch Clearwater vIMS.

More details can be seen here.

ETSI’s new initiative is delivering an open source NFV Management and Orchestration software stack which is set take away attention from the MANO and turn it into a given piece of software. This puts more focus on the VNFs. The message could be that Service Orchestration using VNFs are therefore to be the focus of attention for Telco organizations.

References:

https://osm.etsi.org/

http://www.etsi.org/technologies-clusters/technologies/nfv

https://networkbuilders.intel.com/docs/E2E-Service-Instantiation-with-Open-Source-MANO.pdf

https://www.youtube.com/watch?v=JJlxwJStkTk

This is a copy of a previous Linkedin Post Dated June 7 2016 which was not present on this Blog.

https://www.linkedin.com/pulse/nfv-telco-vepc-solutions-syed-habib-lutfullah-kakakhel/

In telecom networks the option to place an LTE vEPC stands out as an exemplary demonstration of NFV’s application. The figure below gives the generic NFV Architecture. It be divided into 3 main sections:

  1. The Management and Orchestration
    1. Consists of NFV Orchestrator, VNF Manager and Virtualization Infrastructure Manager.
  2. The NFVI – NFV Infrastructure
    1. Consists of Hardware, Virtualization Layer and Compute, Storage, Network Virtualization Software
  3. The VNFs – i.e. the virtual network functions.

The type of function the VNF provides shows what this NFV network delivers. That is if the NFV network delivers as a Network Service an LTE Core end to end communication then there will be EPC functionality implemented and provided by the VNF part of the NFV network.

See the figure below from an ETSI Proof of Concept work.

It shows the vendor CYAN providing NFV Orchestrator (NFVO) and VNF Manager (VNFM). Redhat and Openstack provide the Virtualized Infrastructure Manager (VIM). The figure also shows the relevant Infrastructure hardware and hypervisor software solutions. Finally it shows the VNFs as being Connectem’s vEPC. If the VNF was implementing a different functionality say it was a vIMS then the rest of the components in the figure could be the same and the end to end Network Service being provided by the NFV network would be different. Therefore the function implemented inside the VNF defines what service the NFV network provides. Therefore the work done by the VNF decides whether your NFV network is Telco or Enterprise; LTE or WiMAX; LTE or 3G etc.

For a list of possible Telco VNF’s see the figure below.

Every chuck of functional blocks could be implemented together as a VNF. So what Connectem is doing is implementing the LTE MME, SGW, PGW, HSS, PCRF functionality and packaging it as a VNF which can be run atop virtualized infrastructure. In the ETSI Proof of Concept, Connectems solution therefore does this according the NFV specifications so that the VNF can be managed by a VNFM and its infrastructure is composed so as it can be managed by a VIM and all this can be controlled and coordinated by an NFVO i.e. NFV Orchestrator. Therefore you get an LTE EPC functionality inside the virtualized NFV environment.

The ETSI definition of a VNF is that a VNF is “a Network Function capable of running on NFV Infrastructure (NFVI) and being orchestrated by a NFV Orchestrator (NFVO) and VNF Manager”.

Coming back to our vEPC example the VNF has components. These VNF Components (VNFCs) can logically be pictured as below:

ETSI mandates that a vendor can choose to implement components as they wish inside the VNF environment as long as they speak to the other NFV architecture components as per their defined VNF interfaces. This means that the different components can utilize efficient compute storage and networking procedures instead of the standards body defined communication methodology. An example is that inside the vEPC software the MME will communicate with the SGW but will utilize efficient computational methodology instead of the 3GPP defined interfaces. If for some reason (say a lab environment) a vendor chooses to implement the 3GPP interfaces inside their vEPC it won’t be as fast and as efficient but it can be used to showcase 3GPP communications inside NFV.

Good VNFCs software design is what will distinguish different providers of vEPC software solutions.

References:

http://www.etsi.org/technologies-clusters/technologies/nfv

https://www.opennetworking.org/images/stories/sdn-solution-showcase/germany2015/CENGN%20-%20NFV-based%20LTE%20Core%20in%20the%20Cloud.pdf

http://nfvwiki.etsi.org/images/NFVPER%2814%29000010r2_NFV_ISG_PoC_Proposal-E2E_vEPC_Orchestration.pdf

This is a copy of a previous Linkedin Post Dated May 16 2016 which was not present on this Blog.

https://www.linkedin.com/pulse/nfv-mano-management-orchestration-syed-habib-lutfullah-kakakhel/

MANO is the brain of the NFV Network. It is the part of the network through which control operations are performed on virtual network functions and virtual network functions infrastructure.

One set of v-eNB, vMME, vSGW, vPGW, vPCRF can be assumed to be a Network Service. Each of the above v’s provide distinct Network Functions which with the v’s are deployed as Virtual Network Functions on Virtual Network Functions Infrastructure. The Virtual Network Functions Infrastructure is hardware with the virtual abstraction layer providing virtualization. These are the acronyms.

Multiple virtual network functions are connected together, or chained together, to provide a network service. The physical links are in the infrastructure which is the compute/storage hardware equipment while the logical links are among the VNFs. The endpoint is the Network Service endpoint which is providing service to the end devices.  Between the physical links and the logical links sits the virtualization layer.

The NFVO i.e. the NFV Orchestrator is the part of the network which controls the deployment and operations of virtual network functions.

This is a copy of a previous Linkedin Post Dated May 24 2016 which was not present on this Blog.

https://www.linkedin.com/pulse/nfv-independance-from-hardware-lock-in-lutfullah-kakakhel/

NFV is simple. It’s most simplistic distinction is that it is the Telecom Operators name for hardware independence and software dependence. Hardware is locked in while software is more easily changed (a project manager would say: relative to hardware that is).

We can try to see what problem NFV seeks to solve.

Telecom operators faced a dilemma about hardware. “To launch a new network service often requires yet another variety (of hardware) and finding the space and power to accommodate these boxes is becoming increasingly difficult; compounded by the increasing costs of energy, capital investment challenges and the rarity of skills necessary to design, integrate and operate increasingly complex hardware-based appliances.”

The sentence starts with “to launch a new network service often requires yet another variety” (of hardware). Remember they want to compete with the Whatsapp’s and Viber’s of tomorrow and need agility of deployment.

NFV seeks to provide that ‘Agility of Deployment’ of new network services to Network Operators by taking away dependency on proprietary and vendor locked in hardware. That is the high level purpose.

The rest is architecture. Hardware can be any compute(r) node with associated storage (types) and an accompanied (inter)network of such devices.  Then it follows to make virtual services; Virtual Network Functions with Virtual Network Infrastructure.

To roll out software or new software for a new service is easier than to roll out hardware.

Another primary benefit is elasticity in energy consumption. Energy consumption according to demand. With more control of hardware, which is the energy consuming physical device, via dependence on software this is made possible.

Providing Layer 2 VPN and Layer 3 VPN services has been a requirement of enterprises from Service Providers. Similarly Data Center networks need to provide Layer 2/3 Overlay facility to applications being hosted.

EVPN is a new control plane protocol to achieve the above . This means it coordinates the distribution of IP and MAC addresses of endpoints over another network. This means it is has its own protocol messages to provide endpoint network addresses distribution mechanism. In the Data Plane traffic will be switched via MPLS Labels next hop lookups or IP next hop lookups.

To provide for a new control plane with new protocol messages providing new features BGP has been used. So it is BGP Update messages which are used as the carrier for EVPN messages. BGP connectivity is first established and messages are exchanged. The messages exchanged will be using BGP and in them EVPN specific information will be exchanged.

The Physical layer topology can be a leaf spine DC Clos fabric of a simple Distribution/Core setup. The links between the nodes will be Ethernet links.

One aspect of EVPN is that the terms Underlay and Overlay are now used. Underlay represent the underlying protocols on top of which EVPN runs. These are the IGP (OSPF,ISIS or BGP), and MPLS (LDP/SR).  The underlay also includes the Physical Clos or Core/Distribution topology which has high redundancy built into it using fabric links and LACP/LAGs. The Overlay is the BGP EVPN vitual topology itself which uses the underly network to build a virtual network on top. It is the part of the network which related to providing tenant or vpn endpoints reachability. i.e. MAC address or VPN IP distribution.

It’s a new protocol and if you look at the previous protocols there is little mechanism to provide all active multihoming capability. This refers to one CE being connected via two links to two PEs and both links being active and providing traffic path to far end via ECMP and Multipathing. 2 Chassis multichassis lag has been one option for but it is proprietary per vendor and causes particular virtual chassis link requirement limits. Ingress PE to multiple egress PE per flow based load balancing using BGP multipathing is also newly enabled by EVPN.

There is also little mechanism in previous generation protocols to provide efficient fabric bandwidth utilization for tenant/private networks over meshed-style links. Previous protocols provide single active and single paths and required LDP sessions and tunnels for full mesh over a fabric. MAC learning in BGP over underlay provides this in EVPN.

Similarly there is no mechanism to provide workload (VM) placement flexibility and mobility across a fabric. EVPN provides this via Distributed Anycast Gateway.

 

Two edge computing specifications are present.

Facebook OCP’s CG-OpenRack-19 and LinkedIn’s Open19.

They provide for Rack Layouts, Compute, Storage and Networking.

Networking for CG-OpenRack-19 is copied below. The servers sleds in the pictures appear to be single homed as per the colors. It would be interesting find out which protocol handles the Active Active state of the multi homed Compute and Storage Sleds if that is at all present.

OpenRack-19

Given here

Open19 gives 100G bandwidth capabilities and some details are on its website.

5G’s edge ultra low latency requirements would could require edge solutions and it would be interesting to see how things play out ahead.

This also brings to mind SD-WAN because these edge racks will be at least connected in a large WAN.

Google’s B4 is one of its software defined inter data denter WAN solution. Google’s Espresso is its peering edge solution. Espresso links into B4 domain via B2. This link has the details of Espresso as shared by the Google team.

 

Google-Espresso-B4.JPG

Google is not employing an army of networking engineers to run these because they are software defined and programmed bots will probably be doing operational tasks. To operate this network there are Site Reliability Engineers though.

Here is one public job advertisement that relates as to what an SRE is expected to be like:

We have reliable infrastructure and can spin up new environments in a couple of hours. Automate everything so there is more time for exploring and learning. Foster the DevOps mindset

What are our goals?
  Internationalisation
  Deploying multiple data centers
  Deploying every 5 minutes
Requirements
  Experience with Java or JavaScript in a Dockerised environment
  Linux Engineering/Administration
  Desire for improving processes
  Have a passion and most importantly, a sense of humour
Tech Stack (you DO NOT need experience in all of these)
  Kubernetes + Docker
  Terraform + Ansible
  Linux
  Kotlin + NodeJS
  ELK stack
  AWS

This is obviously an SRE for the servers side and the application enablement side of things. If there is a large software defined edge network like Espresso and a large Edge-to-DC network like B2 and a large software defined inter-DC network like B4 you will need a different SRE.

Here is Google’s version of a Site Reliability Engineer Job.

Job description
Minimum Qualifications

BS degree in Computer Science or related technical field involving coding (e.g. physics or mathematics), or equivalent practical experience.
3 years of experience working with algorithms, data structures, complexity analysis and software design.
Experience in one or more of the following: C, C++, Java, Python, Go, Perl or Ruby.

Preferred Qualifications

Systematic problem-solving approach, coupled with effective communication skills and a sense of ownership and drive.
Interest in designing, analyzing and troubleshooting large-scale distributed systems.
Ability to debug and optimize code and automate routine tasks.

About The Job

Hope is not a strategy. Engineering solutions to design, build, and maintain efficient large-scale systems is a true strategy, and a good one.

Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google’s services—both our internally critical and our externally-visible systems—have reliability and uptime appropriate to users’ needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance.

SRE is also a mindset and a set of engineering approaches to running better production systems—we build our own creative engineering solutions to operations problems. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. As SREs are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to solve a broad spectrum of problems. Practices such as limiting time spent on operational work, blameless postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting and dynamic day-to-day work.

We can see that Google’s SRE Job Ad is all software along with large scale distributed systems requirements.

Now if we note this extract from the Wikipedia SD-WAN article:

“With a global view of network status, a controller that manages SD-WAN can perform careful and adaptive traffic engineering by assigning new transfer requests according to current usage of resources (links). For example, this can be achieved by performing central calculation of transmission rates at the controller and rate-limiting at the senders (end-points) according to such rates”

and we also note this extract:

“As there is no standard algorithm for SD-WAN controllers, device manufacturers each use their own proprietary algorithm in the transmission of data. These algorithms determine which traffic to direct over which link and when to switch traffic from one link to another. Given the breadth of options available in relation to both software and hardware SD-WAN control solutions, it’s imperative they be tested and validated under real-world conditions within a lab setting prior to deployment.”

We see Algorithms.

Its clear that there are different algorithms running these Software Defined networks (Google’s software defined Espresso, B2, B4 and Jupiter). These algorithms automate, kick in and optimize. Google becomes a large scale distributed system with various algorithms here and there. While Software Architects and Software Engineers will have developed these algorithmic nodes and programmed them into network devices/servers an SRE is the human who will operate the system. A team of SREs.

One aspect of Networking protocols is that they are for a multi-vendor, multi-enterprise and multi-domain environments. They provide simple consensus to connect two or more different network devices.

To take a merchant silicon network device like OCP’s Wedge and OCP style servers and make one large network like Google out of it will require software engineering to remake the NOS (Network Operating Systems) part at least. There will be atleast a Meta-NOS, somewhat running on top of a typical NOS which would handle the SDN – software defined algorithms. In addition to the SDN controllers talking to this Meta-NOS. Multiple layers of SDN controllers will be talking to each other and you can call this a network protocol or an SDN algorithm but it will be part of distributed systems software architecture and it will be programmed in place by software engineers.

Large Scale Distributed System on Merchant Silicon Hardware – Software Defined Meta-NOS – SDN Controllers – Hierarchical SDN Controllers – Algorithms.

Sounds like a Program Management task instead of PMP scale Engineering Project Management task. You will need Mathematicians to sit with Network Architects, Distributed Systems Architects and Software Architects. The Mathematicians will do give the algorithms. They will be important too.

Fun times.