The target audience of this book is anyone who is interested in distributed applications. Although we concentrate on our implementation called Agni Cluster OS, the general principles and techniques described in here are really applicable to other systems as well.

The reader is expected to be acquainted with general programming concepts, OOD (object-oriented design) and must have general understanding of services (i.e. SOAP, REST etc.). The knowledge of C# is beneficial but not required, as Java developers can as-easily grasp the concept as .NET ones.


We hear many cool names daily: "Kafka", "Akka", "Hadoop", "Zookeeper", "Google Spanner" etc.. What do all of these names have in common - the distributed architecture that makes them "hot" on the market - ability to process so much data that a single computer, no matter how fast, can never accomplish on it own.

We have selected .NET, wait, not that .NET that you think about in terms of IIS/AD and Fortune 500 companies (J2EE anyone?), but modern cloud-ready *nix-friendly system. Our writing is going to concentrate around C#, but one can use any CLR language.

We are going to focus on the way to the infinite scalability on all tiers of the system: your mind, architecture, design, code, support, etc. One can scale hardware (i.e. IaaS) but if the app is not scalable - alas! - the solution is not really as "hot" as advertised.


Why did we write this?
Let's start with a trite, yet very important topic: the customers of my business do not care about how I get my services provided to them. It is my problem. They don't care about technology and tools that I use (unless I decide to become a software technology company). And here comes the problem - fragmentation and too much choice (which is sometime equivalent to having no choice at all) for me, as a business owner. Honestly, all I need is a good stable solution that I don’t have to pour millions into every year just to update 3rd party libs to keep the thing afloat.

Matters get worse at scale. High-scalability adds additional dimension of complexity. The developers are harder to find, the various PaaS offerings lock-in for years. The architecture has not 100% crystallized - there are microservices, REST (but someone argues for RPC), and now serverless computing altogether!

The cost of distributed cloud systems development has skyrocketed. Many companies decided to use simple languages like PHP and Ruby that facilitate quick Web site construction but are not suitable for high-computational loads facing big problems down the road (i.e. PHP@Facebook). To this day there is no unified way for organizing/operating clusters of server nodes which would have been as standardized as C APIs for IO tasks.

We believe that the construction of distributed systems should be easier and we think that currently there is no set application-level approach to building such systems. Many great technologies have emerged - primarily targeting some aspect of what we (as a business) need to accomplish, yet many projects re-do the most tedious and cumbersome parts over and over.

All of the popular distributed projects (Akka, Kafka, Spark etc.) have a concept of a "cluster" - to spread to workload or storage load or both. The problem is, as we see it, every project implements it in a "private" way - differently. For example, can web server farm be hosted by Hadoop Yarn? Or maybe Zookeeper? Probably it could, but neither Yarn, no Zookeeper were meant to be used this way. What we posit with Agni is that the concept of "clustering"/distributed nodes must be provided to ANY kind of application which runs on it, just like a regular OS provides processes/threads to ANY kind of application that can run on top.

We chose .NET, or more specifically CLR(Common Language Runtime) for its performance being close if not equal (if used properly) to native languages like C++, and the high-level of abstraction along with rich reflection/metadata which is very important for making run-time decisions/services. The abundance of developers who understand C# and simple base class library concepts like logging, collections and DB access really cuts the cost. Microsoft is moving very aggressively towards making .NET as a whole a first-class citizen of a large scale world, a market that was dominated by Java/C++/Erlang /NodeJs technologies.

Why Many Companies Face Troubles 2 Years Down the Road?

Companies/startups elect languages like Python/Ruby or PHP because those languages have a convenient set of libraries allowing them to put some working sites in production very quickly. The problem, however, is that later, when the number of users increases and business requirements start to demand more intense logic, those solutions fail for number of reasons:

  • Usually, a lack of proper architecture of the system as a whole. No consideration may have been given from day one to concerns like user geo affinity, distributed caching, session state management, common configuration formats, logging, instrumentation, management of large installations with many nodes. Developers usually do not consider:

    • That any process (be it web server, app server, tool etc.) need to be remotely controlled in cluster environment so it can be stopped/queried/signaled
    • All tools must be command-line enabled (not UI only), so they can be scripted and executed in an unattended fashion
    • There may be 100s of computers to configure, instead of 1. Are we ready to maintain 100s of configuration files?
    • Time zones in distributed systems, cluster groups, NOCs. Where is time obtained from? What time zone? What time shows in global reports?
    • Any UI element on the screen may be protected by permission (i.e. "Post Charges" button may not show for users who do not have access)
    • Row-based security. Security checks may span not only screens/functions but also some data entities such as rows
    • Web session state may not reside locally (i.e. local disk/memcache) if user reconnects to a different server in the cluster
    • Pushing messages to subscribers/topics. Using appropriate protocols (i.e. UDP). Not thought about when the whole system runs from one web server.

  • Many systems use one central database instance (which is convenient to code against), and have big troubles when they need to split databases so they can scale, because all code depends on central data locations (e.g. one connect string used in 100s of classes)
  • The scripting languages (e.g. PHP, Python, Ruby) used for web site implementation are not performant enough for solving general programming problems (try to build a PHP compiler in PHP) involved in high-throughput processing. It is slow for such tasks and was never meant to be used that way. What happens next, is that developers start to use C++ or C where the development paradigm is absolutely incompatible with the one in PHP, complexity keeps increasing as the number of standards internal to the system is increased. You need more personnel to develop this way
  • Security functionality is usually overlooked as well as most applications do not have security beyond user login and conditional menu display on the homepage which depends on 5-10 fixed role checks. Later, businesses need to start protecting individual screens/UI elements with permissions. This usually creates mess in code and eventually precipitates a major re-write. The inter-service security in the backend is usually either completely overlooked so any node can call any other node bypassing all checks, or just checks for one of two roles via a simple token.
  • The ALM (Application LifeCycle Management) is usually not really though about. "We will deploy and manage changes somehow when we come to it"

Our Vision Background

We have been dealing with these problems for the past 20 years. We came to realization as far back as 1996 that there is a need for a "Business Operating System". The main idea is to have a lego-like kit of micro solutions that are very configurable and allow developers to assemble complex systems in no time as the majority of system/architecture/ALM-specific challenges will be solved for you.

When you use Linux or Windows you do not need to understand how files are written to disk. You don’t need to know how video card works. OSes do a great job, but when it comes to business/data-centric apps - there is no similar approach.

These days systems are very much distributed and back-end/cloud based. Inherently, there are many nodes/servers to run your system on, hence we came up with our Agni Cluster OS to unify our knowledge and the code base.

In a nutshell, Agni OS is a software system that promotes multi-layered/tiered architecture with central coordinator service bus and accompanying set of libraries (less than 8 mbytes total) that unify/provide the solutions to problems that we had to address in many projects in the past:

  • IoT Fleet Tracking - had to process vehicle coordinates and sensor data supplied via a cellular network, make real-time decisions, generate alerts (i.e. geo-fence violations etc..)
  • Social - traverse chains of hundreds of millions of user messages streaming from social networks
  • Portfolio Management/HFT - quick reaction to market changes - processing big stream of data from multiple exchanges (100Ks+/sec)
  • Medical billing MCR/MCD; Clinical EMR/LTC/MDS multi-tenant SaaS

Our Philosophy + Unistack

We want to reduce complexity but without feature loss. The way to achieve this effect is this - reduce the number of standards used in the system. The less standards we need to support/keep in mind/remember - the simpler the whole system becomes. For example: instead of supporting 4 different logging frameworks where this one uses one kind of configuration and that one uses a completely different set of configuration - we use just one. Once developers read logging tutorial they can easily understand how logging works in any tier of the whole system.

Another big choice to make - select a runtime/language. This will probably start a holy war. Before reading further, please answer the following questions:

  • What primary language/s are Windows and Linux (and others) written in?
  • What language are you web browsers written in?
  • What about databases? Oracle, MySQL, MsSQL, DB2?
  • What about major desktop apps: Photoshop, Office, various Audio and Video editing tools?
  • What about compilers/Interpreters for: C, C++, JavaScript, Java, C#, Ruby, PHP?

For some reason, none of the mission critical software like OS kernels and DB servers, compilers and web browsers are written in PHP or Ruby. It is not because of historical reasons. Has anyone written a new web browser in Erlang, Ruby or Python yet? Of course not.

This is because Erlang, Ruby, Python, PHP (and 20 others) are specialized tools that simplify some particular aspects of the system architecture/coding, but they are all inapt in different ways when it comes to system programming. PHP is not a good tool for database server programming. Ruby was not meant to be used for writing high-performance compilers. So, to build a large sophisticated system one would use 25 different languages for different things. Because all of those disparate components require their own "gurus", configuration/operation standards, we decided against those tools/languages.

That did it for us - we had to select a language which is really a general-purpose one, very suitable for system programming and creation of large-scale systems. When I say "system programming", I do not mean "device drivers" and "Linux kernel modules". What is meant:

  • Ability to implement custom data structures efficiently (trees, maps) with execution speed close or equal to C/C++
  • 64 bit support - a must. Must be able to allocate 128Gb ram per process - with no sweat
  • Process model that supports large ram and long execution times (months) without restarts (efficient GC)
  • Process model that supports globals, context globals, threads and lightweight concurrency models
  • Efficient background/concurrent GC
  • Ability to minimize GC load by using instance pools (or stack-allocated structs in CLR)
  • Fast integer and floating point math. Efficient CPU-register Bit operations
  • Good code modularization
  • Good support for working with strings, especially Unicode compliance
  • Good network stack support - TCP/IP/Sockets

Both JVM and CLR definitely qualify.

Of course no one can touch C and C++ when it comes to efficiency, custom memory allocators and pointer manipulation, but for one tiny problem. Those bare-bones languages are very low-level for the business/data-centric app creation. The lack of good reflection in C++ really kills it for our purposes. Had we taken C++, we would have needed to create a C++ frontend language that would have supported reflection, GC. So basically we would have had to create our own JVM+Java or CLR+C# which is not practical. Instead we decided to use existing CLR/C# or JVM/Java in such way that our code does not depend on particular features of library bloat that surrounds those platforms, rather we have rewritten all base service ourselves, thus bringing a "Unistack" to life.

UNI-STACK = a Unifed Stack of software to get your job done. Use one library instead of 10s of 3rd parties that all increase complexity. In Unistack everything is named, configured, and operates the same way, thus reducing complexity 10-fold. Unistack was purposely coded to facilitate distributed apps creation, yet (unlike many PaaS) it can be used to write basic client-server apps that all run on one machine without any bloat.

Transaction processing share nothing, scale horizontally
Configuration/management share everything or as much as possible
Unify patterns, languages, components
Avoid 3rd parties as much as possible - direct and transitive dependencies
Watch For Edge cases - indicate that something is wrong (i.e. many "IF"s)
Appearance of 3rd party - must be justified
Introduction of new compilers/languages (need to justify the purpose)
Priorities First Reuse, Second: build using Agni, Third: use open source, Fourth: buy proprietary

Points To Consider

There are a few important points that are usually overlooked.

Why do we need distributed applications at all? If computers were infinitely scalable then would that have not sufficed for serving millions of users from a single box? It probably would, and the app dev paradigm could have stayed in the "client/server" age. But computational power of single units is finite, therefore we need to devise new ways of writing applications - that are architected in order to scale on many computers.

Now, let’s talk about IaaS. Many think that they are "in the cloud" because they install their XYZ software on AWS or Azure. While that may qualify you for "cloud", the software by itself is not scalable if it is not architected/implemented for scalability. The IaaS does not scale (i.e. your MS DOS programs) by itself, although you could spawn in on 100s of EC instances. This is not how Google is written. It is a time to scale your app architecture and then IaaS will naturally support it.

Consider PaaS - it does make the particular service of interest scalable, but only that part of the service which is offered as a PaaS (for example a SQL database). An application, as a whole, is still not scalable completely if only some of its pieces are. Another problem is coupling. Once you start using some PaaS feature, you are now dependent on this service, and if you have not provided an abstraction - then you will not be able to replace that service provider at all. So the apps that rely on various PaaS offerings are better to be architected properly not to hard-link (even logically) with particular provider. Of course this may not be a problem if you purposely want to go with, say Amazon, Google or Microsoft forever.