What is Mesos

Traditionally the way how we run DC is either bare-metal or by virtualization i.e. creating Virtual Machines that run on top of the host HyperVisor such as VMWare Vsphere or Xen (AWS). Our DC often consists of many applications relying on shared services like Databases, Load Balancer, Caching Layer, etc. It is often a good practice to allocate dedicated host/VM for such services. i.e. db-1, db-2 for Db, and so on. In this way we can achieve:

  • Separation of concern: each VM/host will do one thing and one thing only, troubleshooting is easy
  • Isolation: each service (DB, application) run on its own closed environment and can't interfere each other.

There are however, a few drawbacks:

  • Resource utilization is low: This is the biggest problem with static partitioning. Some of your applications may require large amount of memory but not so much on CPU, and vice versa. Most cloud provider doesn't allow you to tweak the CPU/Memory ratio, resulting in abundant CPU/Memory that is not utilized.
  • Adding overhead. Running on VM adds extra overhead to our system and since we need to create new VM for every new service, the overhead will be substantial. Also we often need to allocate IP for each VM/host, we risk exhausting the available IP space when the number of services grows.
  • Scaling is hard. Scaling up the VM/host will cause disruption to the hosted service. While adding new servers means either recreate the whole environment from scratch or run from a pre-baked image. Either way adds extra complexity to your deployment pipeline.

Apache Mesos

Mesos is an open-source software for managing your Data Center(DC). The core idea is to treat your DC as a single pool of resources. The resources can be distributed for various applications when needed as well as released when done making it very elastic for applications that have variable loads.

Below are the key concepts for Mesos:

  • Mesos Master: controls Mesos cluster, collect resource offerings from agent and give out resource offer to frameworks. You need to run multiple master for high availability. However only one master can be active leader at a time. All request to standby master will be redirect to the active one. Leader election is handled by Zookeeper.
  • Mesos Agent: registers its resources to the master(how much Memory/CPU it possesses) and execute tasks
  • A framework is an application running on top of Mesos. It consists of 2 components: A scheduler that registers itself with Mesos Master to be offered resources(CPU, RAM), one (or many) executor(s) that can be launched in Mesos Agent to run the tasks.

The above picture depict how resource offer and task scheduling are done in Mesos. Let's go though this step by step:

  1. Agent 1 reports to master that it has 4 cpus and 4gb of memory available.
  2. Mesos Master send a resource offer <4cpu, 4gb> to framework 1
  3. Framework 1 has 2 outstanding tasks each requires <2cpu, 1gb> and <1cpu, 2gb> respectively. Since the offer given by Mesos Master is adequate, the framework accept the offer and pass the tasks to master to be executed on the agents
  4. Mesos Master schedule the 2 tasks on 2 agents. The executor on each agent will spin up containers to execute the tasks given. More on container later.

The Framework scheduler talks to Mesos Master via the Scheduler interface. It receives resource offers as well as status updates including tasks status and executor status via this API. We will look at how to implement our own framework using this interface in Chapter 6.

You may notice that the framework itself can make decision on which resource offer to accept. This is a powerful design as it give you full flexibility to run your applications. Some examples:

  • A framework can choose where to run based on where the data is stored. This ability is called _data locality _and is very important to applications like databases, etc
  • Application can choose when to run based on certain condition e.g. minimum health check pass. This can be used for zero downtime rolling upgrade. More on the topic here Blue-green deployment on Marathon

results matching ""

    No results matching ""