Agni OS Overview

Why is Agni Cluster is called an "OS"? Our goal is to make distributed programming available to any application type. When you write a software that gets installed locally, you don’t really care about inter-machine problems. The inter-process/thread/file APIs are provided within the machine to you by "regular OS", and you don’t need to think twice.

Agni OS extends this simple "OS" approach to writing distributed programs. Those programs are distributed only because a single server can not serve that many users and can crash and burn because of hardware failures. Why not create an Operating System for running distributed programs? Provide APIs to devs so that apps can now logically scale on the architecture level - this is exactly what Agni OS does.

So what is Agni Cluster OS?

  • A platform /OS for building large systems, systems that may get very large
  • Unify the concept of distributed "cluster" for any application type: web, db, service, etc.
  • Start dev on local PC - scale out to 1000s of nodes. Built for unlimited sizes and data centers
  • It is just a library/framework - a set of a few .NET assemblies under 8mb total size
  • Written in Plain C# + some js, css,less for UI
  • Free from "heavy" dependencies, such as : AD, IIS, DataCenter SQL etc.
  • Uses only the very base stuff: CLR(jit, gc), BCL: Type, List<>,Dictionary<>, Thread, Task, Socket,Interlocked, + primitives (int, string, decimal etc..)
  • UNISTACK approach to building distributed apps: from serialization, logging, to web MVC, APIs, REST and security - done the same way in any app

Structure

The first question that needs to be answered - how to organize data, metadata being the first data type that we need to build a massively scalable system. A proper structuring of metadata is needed.

A hierarchy is a natural multi-level organization pattern self-evident in nature. It has been used in computer science since its dawn. The tree-like structures usually yield O(log(n)) search complexities. Hierarchies are easy to understand, the management nature of "supervisors" - or "nodes that are above me" are used extensively in Actor-model systems like Erlang.

Single-level horizontal scalability is limited because at some point you will have so many nodes that will require too many managers. Flat designs are limited.

There are other patterns, like grid which are suitable for some tasks (i.e. peer-to-peer sharing), however hierarchy can be thought of as an index (i.e. BTree) overimposed on top of a grid. This way parts of the grid may be managed/addressed more efficiently. In other words, hierarchical cluster topology does not impose a hierarchical restriction on data flow or functionality when it is not desired, but practically in very many applications we see the opposite - the "large problem" gets mapped and then reduced by subordinate workers.

To summarize, hierarchy is a good choice because:

  • Logical organization of a large system
  • Geo/physical layout of large system: regions/data centers/zones
  • Chain of command/supervision/governor pattern
  • Natural MAP:REDUCE for many functions, i.e. telemetry/instrumentation data
  • Update Deployment and general management

There is one concern that needs to be discussed right away. It may seem that in a hierarchy, a node at the higher level poses a single point of failure risk to its subordinate nodes. If one cuts the tree at the trunk - the whole tree falls down. So the higher-level nodes aggregate the features of the lower-level nodes, and that includes the failures. In reality, the higher-level nodes do not need to be a single point of failure as we can make a secondary and tertiary backup copies of the layer, for example we may have 2 or more zone controllers sitting on the zone governor level. The details of the failure modes are described in the next sections.

Topology

Agni OS organizes geographical and logical topology of the system as a hierarchical tree of sections that start at the world level, sub-split by regions, the structure akin to a file system:

Regions have sub-regions. They also contain Network Operation Centers (NOCs):

NOCs contain zones which can also have sub-zones and hosts (actual servers).

The following typical Agni OS topology is used as an example:

In this example, the "World.r" is the root region that contains two sub-regions: "US.r" and "EU.r" and global governor NOC "Glob.noc".

Every NOC contains zones which contains sub-zones and/or hosts. Zones provide logical grouping of hosts - for example, in the example above, the "DB.z" has sub-zones representing areas of a database handling User, Financial and Social data stores.

The red boxes represent zone governor processes that control the underlying hosts. It is not required to have a zone gov in every zone.

Topology Navigation

Since cluster topology is hierarchical, and a kin to a file system structure, it makes sense to address particular resources in the system using a logical path.

Agni OS does not use physical addresses or DNS names for host identification, because these are lower-level network related things. Instead, logical paths are used to traverse the hierarchy and address particular resources. Notice that section name suffixes (i.e. "*.r","*.z" etc.) may be left out.

The metabase API (discussed below) provide the detailed coverage for navigation and working with logical cluster regional paths (i.e. get the parent, NOC, compare paths, add paths etc.)(see Metabase API Reference)

The following diagram illustrates the logical naming scheme:

Logical naming is a higher-level abstraction than DNS, it allows to use unified scheme of addressing nodes of the system not only for the purposes of communication between nodes but also locating entities in the whole system tree, regardless of actual means of communication used.

The Metabase

Agni OS makes an extensive use of metadata (data about data) to describe its’ instance. An instance is a distinct installation of Agni OS on a set of physical and/or virtual resources. One may operate an instance from many different data centers running various applications. The logical decision about the instance composition is dictated by a particular business solution.

There are many various factors to consider and configure in a distributed system. We took an approach akin to reflection - a set of data about the system with APIs that can programmatically extract that data for various OS functions. This concept is called "Metabase" - a special kind of database that stores metadata about the Agni OS system.

Metabase is a hierarchical data structure that gets mounted by every Agni OS process via a Virtual File System (VFS) access. It can be depicted as a structured system of config files which describe various system entities. In code, metabase is represented by the Agni.Metabase.Metabank class. Application developers never create an instance of Metabank, as this is done by the BootConfLoader as a part of IAgniApplication container setup. The VFS is a software-mode component that abstracts the access to files on various PaaS and physical layers (i.e. SVN, Amazon S3, Google Drive etc.). The VFS usually supports version control, so any changes to the whole metabase are version-controlled. This is beneficial for keeping track of releases and rolling-back changes in case of emergencies, so all changes in the cluster setup, packages and configuration are always available.

Metabase file system root mapped to a local folder:

The metabase is a read-only resource - no process can write to it (think .NET reflection), apps can only read system data by using a rich set of APIs (see the Reference section).

Agni OS is designed to scale into a multi-million node range, consequently its metadata source must be "big" enough to handle many entities, on the other hand the metabase is loaded by every process and we can not afford to preload lots of data structures into every process upon start, therefore metabase uses lazy-loading patterns with extensive caching on multiple levels. For example, when a program needs to make a service call to some remote host, it first needs to locate this host in the system, check if the host is dynamic, resolve it’s address etc. The system may also consider multiple hosts that can perform the service, in which case multiple metabase catalogs and sections may be accessed. If the process repeats, the data gets read from the in-memory cache.

Logically a metabase is organized as a set of Catalogs - a top-level folders accessed via VFS. Catalogs group related functionality in a single-addressable unit:

Path Catalog Description
/app Application App catalog lists applications along with their inner composition - packages and configuration. It also declares Roles - which are a named kits of applications with startup scripts
/reg Regional Regional catalog defines the hierarchical topology of the system. It contains geo regions at the root, branching into sub-regions, NOCs, zones/sub-zones, and hosts. The general or app-specific configuration may be specified at every level for the structural override (discussed below)
/bin Binary Contains named binary packages (folders) that get distributed to the destination hosts
/inc (or custom) include Technically not a catalog, but rather a pattern of re-using of include mixins in various configs
/ metabase root Network service registry, platforms, common config mixins

Application and Regional catalogs consist of sections which represent a corresponding entity both logically and in code. For example /app/applications/AHGov is an application "AHGov" accessible in code using an instance of SectionApplication class.


Metabase sections contain multiple config files, the $ represents that the file is for the level where it is declared:

$.amb Metabase data (Agni Metabase) config file. Contains metabase data, not application config
$.app.laconf Provides application configuration override for any app
$.XYZ.app.laconf Provides application configuration override for app called XYZ

The configuration files can be in any of the formats that NFX library supports: JSON, Laconic, XML. Laconic being the default format (it is the most convenient format to use for configuration (see the NFX specification for detailed Laconic format description)). In order to use different configuration file format, it is sufficient to provide a different extension, e.g. "$.app.xml".

Agni Metabase Manager (AMM) command line tool performs static metabase content analysis and detects various conditions, such as:

  • Syntax errors in config files
  • Duplications and Omissions, i.e. items referenced in the system but not declared (i.e. unknown applications, hosts, roles, regions etc.)
  • Various logical errors, such as improperly mapped contracts, duplicate definitions, gaps in key mappings, network config errors etc.

The AMM tool is executed before metabase changes get committed into the version controlled backend - this prevents the publication of bad metabase data that causes runtime errors.

Metabase Regional Catalog

Regional catalog define the physical and logical topology of the Agni OS instance. Its starts from Region sections defined in directories with "*.r" extension. Every region may have sub-regions and NOCs

A NOC stands for "Network Operation Center", represented by directories with "*.noc" extension. NOCs are further broken into Zones "*.z" and sub-zones - zones within zones. Zones contain hosts - "*.h" directories.

Example branch of regional catalog:

  • USA.r
    • East.r
      • CLE.noc
        • Gov.z
          • Zgov1
          • Zgov2
        • Web.z
          • Proxy1
          • Proxy2
          • Www1 — full path example: "USA/East/CLE/Web/Www1"
          • Www2
          • Www3
          • Www4
        • DB.z
          • Orders.z
            • Mongo1
            • Mongo2
            • Mongo3
          • User.z
            • Msql1
            • Msql2
            • Msql3
        • ML.z
          • Worker1
          • Worker2
          • Net1
          • Net2
          • Net3
          • NYC.noc

Metabase Application Container Configuration

Agnis OS applications execute in the the IAgniApplication container which is fed a configuration content from the metabase. The effective configuration gets calculated by traversing the graph of metabase sections, each subsequent level overriding the result, in the following order:

  • Metabank.RootAppConfig is used as a base (very root of the metabase - "any application")
  • Role.AnyAppConfig
  • Application[applicationName].AnyAppConfig
  • Regional override: parent sections on path, from left to right, ending with the host, each: AnyAppConfig GetAppConfig(applicationName)
    • AnyAppConfig
    • GetAppConfig(applicationName)
  • Include GetOSConfNode(this.OS)

The configuration gets computed for a particular applicationName, running on a particular host which has a particular OS.

The hierarchical structure of the config is very useful as it allows to specify the configuration prototype at the very root, the cascade down on thousands of hosts, only having to override parts where necessary which is a rare need. This way developers do not need to maintain many files and metabase does not have to store them.

A typical application config declared at the metabase root includes basic mixins:

                                            
    application
    {
        _override="all"
        trace-disable=true

        _include { name=qty             provider { file="/inc/gv/default.laconf" } }
        _include { name=qlue            provider { file="/inc/glue.laconf" } }
        _include { name=instrumentation provider { file="/inc/instrumentation.laconf" } }
        _include { name=object-store    provider { file="/inc/object-store.laconf" } }
        _include { name=security        provider { file="/inc/security/default.laconf" } }
        _include { name=web-manager     provider { file="/inc/web-manager/default.laconf" } }
    }
                                            
                                        

Mixins represent a library of configuration blocks, they get referenced from app config via "_include" pragmas. This modular approach simplifies the configuration of various applications in the whole cluster.

                                            
    glue
    {
        bindings
        {
            binding { name=$(~SysConsts.ASYNC_BINDING) type=$(/gv/types/$glue-async) }
            binding { name=$(~SysConsts.SYNC_BINDING)  type=$(/gv/types/$glue-sync) }
        }
    }
                                            
                                        

The snippet above demonstrates the use of system constants and environment variables. See Configuration Reference section for more details.