Monitoring metrics with Netflix Servo

Developing a metrics monitoring component seems simple, but deceptively so.

While adding a simple counter component is easy enough, more advanced functionalities (polling, filtering, exporting the results… ) are somewhat trickier to code. Also since the metrics must be read and written in parallel with the core application logic, the code will be multithreaded, which adds another challenge: concurrency is hard.

So an in-house solution is doable, but will probably take at least a few days to develop, and that’s assuming it’s bug free.

The alternative is to re-use existing libraries which neatly solve this problem by providing monitoring components such as counter, gauges, timers, and the ability to export the data via JMX. For instance:

Netflix servo
DropWizard metrics

Example: the java code below calculates as many prime numbers as possible for a specified period of time, and uses Servo’s counter and peak rate counter to keep track of the total number of primes discovered, and the maximum rate of primes discovered per second.

The metrics are automatically exposed via JMX so Java Mission Control can pick it up:

servo_mbeans

…and chart the resulting data.

servo_charts

Advertisements

Migrating from Play to the Ninja Framework

The PlayFramework has been my go-to framework of choice for several months. It stands out in the crowded space of jvm web frameworks because it is (relatively) lightweight, with a short learning curve, and a focus on ease of testing both at the unit and functional level.

So far so good. However… The core logic of the framework (version 2 and up) is mostly written in Scala, and it shows:

  • view templates have to be scripted using Scala
  • the build tooling uses SBT, instead of the Maven/Gradle alternatives favored by the majority of Java projects
  • a Scala/SBT plugin is required to compile the code in Intellij and Eclipse.
  • long build times, courtesy of the Scala compiler

These annoyances (from the standpoint of a Java programmer who is not particularly interested in writing Scala code) do add up, so much so that at times this negates the productivity gains promised by the framework.

The Ninja Framework

Ninja_The_Last_Thing_You_See

The Ninja framework is heavily inspired by Play, with the same focus on simplicity and performance, the same functionality (both frameworks provide the same basic constructs in terms of routers, controllers, filters…). The API is also very similar.

Above all this is a pure Java (no Scala) framework so it removes all of the concerns highlighted above.  So all in all it’s an easy trade-off.

Migrating from Play to Ninja

This should be fairly simple because the APIs of both frameworks are near identical. Amongst the salient points:

  1. create a pom.xml with Ninja framework dependencies at the root of your Play project (examples available on the NinjaFramework website)
  2. setup your project as a Maven project. In Intellij this is done with a right-click on the pom above and select “Add as a Maven project”
  3. replace all of the play.* imports by their equivalents ninja.*
  4. create a Java class to host the route configuration (instead of a property file in Play)
  5. some of the properties in the application.conf may differ from Play to Ninja – no shortcut here they have to be analysed on a case by case basis.
  6. (the best part) remove the scala cruft: scala plugin, build.scala, plugins.sbt, activators…
  7. Watch your compilation time plummet. Enjoy.

Testing shell scripts.

This story highlights the potential dire consequences of undefined variables in Shell scripts.

In summary: entire directories were lost because of a buggy shell script running this command:

rm -rf "$VAR/" *

.. which is intended to delete all files under the $VAR directory. However if $VAR is left undefined, then what is executed instead is:

rm -rf *

… which will delete all files in the current directory and subdirectories. oops.

It’s somewhat paradoxical that Shell scripts are frequently used to drive mission-critical activities such as starting/stopping processes, copying, moving and deleting files… and yet these scripts are error-prone, often hard to read and always hard to test.

DCIM100SPORT

Mitigations

Being at the mercy of a buggy shell script is not fun. Thankfully there are ways to prevent disaster from happening.

  • Add “set -eu” to the script.
    “-e” causes the script to terminate if any command fails. “-u” causes the script to terminate when it encounters an unbound variable. One additional line of code which will save lot of trouble… This should be mandatory on top of every script.

  • Check if the variables are set before use
    [[ "$VAR" ]] && rm -rf "$VAR/*"
    It’s not foolproof though as it wont prevent failure due to typos in the variable name.

  • Use a unit test framework
    Yes shell scripts have their unit-test frameworks too, see Roundup or ShUnit.

  • Use a (real) programming language
    bash/zsh were never meant to be used for complex programming tasks. Replace with Python,  or Perl, or Groovy, or Java. Programming languages have much better support for functions, variable scoping, conditionals, string handling …etc. compared to a shell script. The idea is to avoid shell scripts save for the simplest tasks, and use a programming language to handle any kind of elaborate control logic (anything more complex than a pair of if/else statements).