• Platform / Operating system for building large systems, systems that may get very large
  • Unify the concept of cluster for any app: web, db, service...
  • Start on local PC - scale out to 1000s of nodes. Agni built for unlimited sizes and datacenters
  • Physically - a set of a few .NET assemblies < 8mb total
  • Written in Plain C# (ui: some js, css, less)
  • Free from “heavy” dependencies, such as: AD, IIS, Windows cluster, Datacenter SQL etc.
  • We use only the base stuff: CLR (jit, gc), BCL - Type, List <>, Dict <>, Socket, Thread, Task, Interlocked etc.
  • “Unistack“ approach to building distributed apps


  • Hierarchy is natural multi-level management pattern (think trees O(log n))
  • Think Actor-based “supervisors” i.e. Akka, Erlang
  • Single level horizontal scalability is limited as at some point you start having too many managers
  • Hierarchy suitable for:
    • Logical organization of a large system
    • Geographical/physical layout of such system: Regions
    • Chain of command/supervision/governor pattern
    • Natural MAP:REDUCE of many functions i.e.: telemetry/instrumentation/management
    • Update Deployment


Agni Cluster OS organizes geographical and logical topology of the system as a hierarchical tree of sections that start at the world level, sub-split by regions, the structure akin to a file system:

Cluster Topology


  • Every host has a root process
  • It installs the binary packages
  • And runs the apps
  • As defined by the Role
  • Every host is assigned a Role
  • A role is a named kit of apps
    • Web = ahgov, MyWeb, MyWebHook
    • Worker = ahgov, agdida, MyCore
    • Elastic = ahgov, es
  • An app is a named kit of packages:
    • MyWeb = core, aws, My, MyWebUI, MyWebStatic, MyPolicy
    • MyWebHook = core, aws, My


  • Think Reflection. Metabase keeps all read-only metadata about cluster
  • Hierarchical database mounted read-only by every process via version-controlled file system, so All cluster changes are version controlled
  • Lazy Load only what you need. Metabase can store millions of entries
  • Metabase backends (a VFS): SVN, local fs, RDBMS, Git etc.
  • No SPOF - Metabase source is a read-only service
  • Logical sections:
    • OS and Platform Registry
    • App catalog - roles, apps, app packages/deps, root app configs, launch scripts
    • Binary Catalog - binary packages per OS/Platform
    • Regional Catalog - topology of the whole cluster
    • Network/Svc Registry - resolve net names, services into physical addresses


Distributed apps must be purposed for deploy/exec/control on unlimited number of hosts

  • App container - unifies console/web/service - all the same - a process
  • App can be managed from a command line (e.g. >gc)
  • App consists of components - components may be managed (e.g. set log severity threshold at runtime, turn on detailed instrumentation etc.)
  • Configuration - especially if you have 10000s nodes
  • Data Access - where is datastore, is it scalable? Do I use CQRS?
  • BigMemory - cache business domain objects - don’t care how, remove hot-spots
  • Glue - app is distributed only because of physical limitations. Glue simplifies distributed programming by providing transparent IPC mechanisms
  • Instrumentation / Telemetry - including domain oriented


App CLI Management

Web Management

Built-in web management portal into every app (web server or not)
Like a home WiFi router has a built-in management portal


Every process has a concept of Instrumentation and logging built-in this actually comes from NFX. The pyramidal structure of Agni Cluster OS is convenient for telemetry REDUCE function - the zone governors get telemetry from subordinate nodes and emit the data further up the hierarchy, consequently you can view telemetry by host/by zone, or if you are in the apex, for the whole thing.


App container unifies different app types (console, service, web etc.) and views app as a set of components with properties which can change at runtime via console or Web UI


The Todos get dispatched and posted into hybrid queues/pools - they are hybrid because you can re-queue the Todo instances for re-execution right away or in future. The Todos are optionally parallelized by key, in case a processing sequence is important. Todos are used for cluster auto-reset events by means of Correlation keys which allow for structural merges by key.


Dynamic IaaS runtime

Ability to spawn nodes on various IaaS


Global Distributed IDs - monotonic - great for DB insertion


Async Tasks + Queue/Pool - a’la TPL only in cluster


Global context for Lambda execution with signaling for control and tracking

Locking / Coordination

Workset load allocation. Business mutexes. Complex atomic interdependencies - execute 100% serializable transactions

Key-Value DB

Built-in key/value DB with sharding key TTL, and auto rebalancing, uses injectable implementations


Hybrid Sharding Router for RDBMS, NoSQL, Services, and other data sources


Document based database with geo-replication

File System

a’la GoogleDocs - supports versioning, transactions and user permissions/ownership so one could easily build GoogleDocs-like system in no time