Multithreading and Concurrency

1. What do we understand by the term concurrency?

Concurrency is the ability of a program to execute several computations simultaneously. This can be achieved by distributing the computations over the available CPU cores of a machine or even over different machines within the same network.

2. What is the difference between processes and threads?

A process is an execution environment provided by the operating system that has its own set of private resources (e.g. memory, open files, etc.). Threads, in contrast to processes, live within a process and share their resources (memory, open files, etc.) with the other threads of the process. The ability to share resources between different threads makes thread more suitable for tasks where performance is a significant requirement.

3. In Java, what is a process and a thread?

In Java, processes correspond to a running Java Virtual Machine (JVM) whereas threads live within the JVM and can be created and stopped by the Java application dynamically at runtime.

4. What is a scheduler?

A scheduler is the implementation of a scheduling algorithm that manages access of processes and threads to some limited resource like the processor or some I/O channel. The goal of most scheduling algorithms is to provide some kind of load balancing for the available processes/threads that guarantees that each process/thread gets an appropriate time frame to access the requested resource exclusively.

5. How many threads does a Java program have at least?

Each Java program is executed within the main thread; hence each Java application has at least one thread.

6. How can a Java application access the current thread?

The current thread can be accessed by calling the static method currentThread() of the JDK class java.lang.Thread:

	public class MainThread {

		public static void main(String[] args) {
			long id = Thread.currentThread().getId();
			String name = Thread.currentThread().getName();

7. What properties does each Java thread have?

Each Java thread has the following properties:

  • an identifier of type long that is unique within the JVM
  • a name of type String
  • a priority of type int
  • a state of type java.lang.Thread.State
  • a thread group the thread belongs to

8. What is the purpose of thread groups?

Each thread belongs to a group of threads. The JDK class java.lang.ThreadGroup provides some methods to handle a whole group of Threads. With these methods we can, for example, interrupt all threads of a group or set their maximum priority.

9. What states can a thread have and what is the meaning of each state?

  • NEW: A thread that has not yet started is in this state.
  • RUNNABLE: A thread executing in the Java virtual machine is in this state.
  • BLOCKED: A thread that is blocked waiting for a monitor lock is in this state.
  • WAITING: A thread that is waiting indefinitely for another thread to perform a particular action is in this state.
  • TIMED_WAITING: A thread that is waiting for another thread to perform an action for up to a specified waiting time is in this state.
  • TERMINATED: A thread that has exited is in this state.

10. How do we set the priority of a thread?

The priority of a thread is set by using the method setPriority(int). To set the priority to the maximum value, we use the constant Thread.MAX_PRIORITY and to set it to the minimum value we use the constant Thread.MIN_PRIORITY because these values can differ between different JVM implementations.

11. How is a thread created in Java?

Basically, there are two ways to create a thread in Java.

The first one is to write a class that extends the JDK class java.lang.Thread and call its method start():

	public class MyThread extends Thread {

		public MyThread(String name) {

		public void run() {
			System.out.println("Executing thread "+Thread.currentThread().getName());

		public static void main(String[] args) throws InterruptedException {
			MyThread myThread = new MyThread("myThread");

The second way is to implement the interface java.lang.Runnable and pass this implementation as a parameter to the constructor of java.lang.Thread:

	public class MyRunnable implements Runnable {

		public void run() {
			System.out.println("Executing thread "+Thread.currentThread().getName());

		public static void main(String[] args) throws InterruptedException {
			Thread myThread = new Thread(new MyRunnable(), "myRunnable");

12. How do we stop a thread in Java?

To stop a thread one can use a volatile reference pointing to the current thread that can be set to null by other threads to indicate the current thread should stop its execution:

	private static class MyStopThread extends Thread {
		private volatile Thread stopIndicator;

		public void start() {
			stopIndicator = new Thread(this);

		public void stopThread() {
			stopIndicator = null;

		public void run() {
			Thread thisThread = Thread.currentThread();
			while(thisThread == stopIndicator) {
				try {
				} catch (InterruptedException e) {

13. Why should a thread not be stopped by calling its method stop()?

A thread should not be stopped by using the deprecated methods stop() of java.lang.Thread, as a call of this method causes the thread to unlock all monitors it has acquired. If any object protected by one of the released locks was in an inconsistent state, this state gets visible to all other threads. This can cause arbitrary behavior when other threads work this this inconsistent object.

14. Is it possible to start a thread twice?

No, after having started a thread by invoking its start() method, a second invocation of start() will throw an IllegalThreadStateException.

15. What is the output of the following code?

	public class MultiThreading {

		private static class MyThread extends Thread {

			public MyThread(String name) {

			public void run() {

		public static void main(String[] args) {
			MyThread myThread = new MyThread("myThread");;

The code above produces the output “main” and not “myThread”. As can be seen in line two of the main() method, we invoke by mistake the method run() instead of start(). Hence, no new thread is started, but the method run() gets executed within the main thread.

16. What is a daemon thread?

A daemon thread is a thread whose execution state is not evaluated when the JVM decides if it should stop or not. The JVM stops when all user threads (in contrast to the daemon threads) are terminated. Hence daemon threads can be used to implement for example monitoring functionality as the thread is stopped by the JVM as soon as all user threads have stopped:

	public class Example {

		private static class MyDaemonThread extends Thread {

			public MyDaemonThread() {

			public void run() {
				while (true) {
					try {
					} catch (InterruptedException e) {

		public static void main(String[] args) throws InterruptedException {
			Thread thread = new MyDaemonThread();

The example application above terminates even though the daemon thread is still running in its endless while loop.

17. Is it possible to convert a normal user thread into a daemon thread after it has been started?

A user thread cannot be converted into a daemon thread once it has been started. Invoking the method thread.setDaemon(true) on an already running thread instance causes a IllegalThreadStateException.

18. What do we understand by busy waiting?

Busy waiting means implementations that wait for an event by performing some active computations that let the thread/process occupy the processor although it could be removed from it by the scheduler. An example for busy waiting would be to spend the waiting time within a loop that determines the current time again and again until a certain point in time is reached:

	Thread thread = new Thread(new Runnable() {
		public void run() {
			long millisToStop = System.currentTimeMillis() + 5000;
			long currentTimeMillis = System.currentTimeMillis();
			while (millisToStop > currentTimeMillis) {
				currentTimeMillis = System.currentTimeMillis();

19. How can we prevent busy waiting?

One way to prevent busy waiting is to put the current thread to sleep for a given amount of time. This can be done by calling the method java.lang.Thread.sleep(long) by passing the number of milliseconds to sleep as an argument.

20. Can we use Thread.sleep() for real-time processing?

The number of milliseconds passed to an invocation of Thread.sleep(long) is only an indication for the scheduler how long the current thread does not need to be executed. It may happen that the scheduler lets the thread execute again a few milliseconds earlier or later depending on the actual implementation. Hence an invocation of Thread.sleep() should not be used for real-time processing.

21. How can a thread be woken up that has been put to sleep before using Thread.sleep()?

The method interrupt() of java.lang.Thread interrupts a sleeping thread. The interrupted thread that has been put to sleep by calling Thread.sleep() is woken up by an InterruptedException:

	public class InterruptExample implements Runnable {

		public void run() {
			try {
			} catch (InterruptedException e) {
				System.out.println("["+Thread.currentThread().getName()+"] Interrupted by exception!");

		public static void main(String[] args) throws InterruptedException {
			Thread myThread = new Thread(new InterruptExample(), "myThread");

			System.out.println("["+Thread.currentThread().getName()+"] Sleeping in main thread for 5s...");

			System.out.println("["+Thread.currentThread().getName()+"] Interrupting myThread");

22. How can a thread query if it has been interrupted?

If the thread is not within a method like Thread.sleep() that would throw an InterruptedException, the thread can query if it has been interrupted by calling either the static method Thread.interrupted() or the method isInterrupted() that it has inherited from java.lang.Thread.

23. How should an InterruptedException be handled?

Methods like sleep() and join() throw an InterruptedException to tell the caller that another thread has interrupted this thread. In most cases this is done in order to tell the current thread to stop its current computations and to finish them unexpectedly. Hence ignoring the exception by catching it and only logging it to the console or some log file is often not the appropriate way to handle this kind of exception. The problem with this exception is, that the method run() of the Runnable interface does not allow that run() throws any exceptions. So just rethrowing it does not help. This means the implementation of run() has to handle this checked exception itself and this often leads to the fact that it its caught and ignored.

24. After having started a child thread, how do we wait in the parent thread for the termination of the child thread?

Waiting for a thread’s termination is done by invoking the method join() on the thread’s instance variable:

	Thread thread = new Thread(new Runnable() {
		public void run() {


25. What is the output of the following program?

public class MyThreads {

	private static class MyDaemonThread extends Thread {

		public MyDaemonThread() {

		public void run() {
			try {
			} catch (InterruptedException e) {

	public static void main(String[] args) throws InterruptedException {
		Thread thread = new MyDaemonThread();

The output of the above code is “false”. Although the instance of MyDaemonThread is a daemon thread, the invocation of join() causes the main thread to wait until the execution of the daemon thread has finished. Hence calling isAlive() on the thread instance reveals that the daemon thread is no longer running.

26. What happens when an uncaught exception leaves the run() method?

I can happen that an unchecked exception escapes from the run() method. In this case the thread is stopped by the Java Virtual Machine. It is possible to catch this exception by registering an instance that implements the interface UncaughtExceptionHandler as an exception handler.

This is either done by invoking the static method Thread.setDefaultUncaughtExceptionHandler(Thread.UncaughtExceptionHandler), which tells the JVM to use the provided handler in case there was no specific handler registerd on the thread itself, or by invoking setUncaughtExceptionHandler(Thread.UncaughtExceptionHandler) on the thread instance itself.

27. What is a shutdown hook?

A shutdown hook is a thread that gets executed when the JVM shuts down. It can be registered by invoking addShutdownHook(Runnable) on the Runtime instance:

	Runtime.getRuntime().addShutdownHook(new Thread() {
		public void run() {


28. For what purposes is the keyword synchronized used?

When you have to implement exclusive access to a resource, like some static value or some file reference, the code that works with the exclusive resource can be embraced with a synchronized block:

	synchronized (SynchronizedCounter.class) {

29. What intrinsic lock does a synchronized method acquire?

A synchronized method acquires the intrinsic lock for that method’s object and releases it when the method returns. Even if the method throws an exception, the intrinsic lock is released. Hence a synchronized method is equal to the following code:

	public void method() {
		synchronized(this) {

30. Can a constructor be synchronized?

No, a constructor cannot be synchronized. The reason why this leads to an syntax error is the fact that only the constructing thread should have access to the object being constructed.

31. Can primitive values be used for intrinsic locks?

No, primitive values cannot be used for intrinsic locks.

32. Are intrinsic locks reentrant?

Yes, intrinsic locks can be accessed by the same thread again and again. Otherwise code that acquires a lock would have to pay attention that it does not accidently tries to acquire a lock it has already acquired.

33. What do we understand by an atomic operation?

An atomic operation is an operation that is either executed completely or not at all.

34. Is the statement c++ atomic?

No, the incrementation of an integer variable consist of more than one operation. First we have to load the current value of c, increment it and then finally store the new value back. The current thread performing this incrementation may be interrupted in-between any of these three steps, hence this operation is not atomic.

35. What operations are atomic in Java?

The Java language provides some basic operations that are atomic and that therefore can be used to make sure that concurrent threads always see the same value:

  • Read and write operations to reference variables and primitive variables (except long and double)
  • Read and write operations for all variables declared as volatile

36. Is the following implementation thread-safe?

	public class DoubleCheckedSingleton {
		private DoubleCheckedSingleton instance = null;

		public DoubleCheckedSingleton getInstance() {
			if(instance == null) {
				synchronized (DoubleCheckedSingleton.class) {
					if(instance == null) {
						instance = new DoubleCheckedSingleton();
			return instance;

The code above is not thread-safe. Although it checks the value of instance once again within the synchronized block (for performance reasons), the JIT compiler can rearrange the bytecode in a way that the reference to instance is set before the constructor has finished its execution. This means the method getInstance() returns an object that may not have been initialized completely. To make the code thread-safe, the keyword volatile can be used since Java 5 for the instance variable. Variables that are marked as volatile get only visible to other threads once the constructor of the object has finished its execution completely.

37. What do we understand by a deadlock?

A deadlock is a situation in which two (or more) threads are each waiting on the other thread to free a resource that it has locked, while the thread itself has locked a resource the other thread is waiting on:
Thread 1: locks resource A, waits for resource B
Thread 2: locks resource B, waits for resource A

38. What are the requirements for a deadlock situation?

In general the following requirements for a deadlock can be identified:

  • Mutual exclusion: There is a resource which can be accessed only by one thread at any point in time.
  • Resource holding: While having locked one resource, the thread tries to acquire another lock on some other exclusive resource.
  • No preemption: There is no mechanism, which frees the resource if one thread holds the lock for a specific period of time.
  • Circular wait: During runtime a constellation occurs in which two (or more) threads are each waiting on the other thread to free a resource that it has locked.

39. Is it possible to prevent deadlocks at all?

In order to prevent deadlocks one (or more) of the requirements for a deadlock has to be eliminated:

  • Mutual exclusion: In some situation it is possible to prevent mutual exclusion by using optimistic locking.
  • Resource holding: A thread may release all its exclusive locks, when it does not succeed in obtaining all exclusive locks.
  • No preemption: Using a timeout for an exclusive lock frees the lock after a given amount of time.
  • Circular wait: When all exclusive locks are obtained by all threads in the same sequence, no circular wait occurs.

40. Is it possible to implement a deadlock detection?

When all exclusive locks are monitored and modelled as a directed graph, a deadlock detection system can search for two threads that are each waiting on the other thread to free a resource that it has locked. The waiting threads can then be forced by some kind of exception to release the lock the other thread is waiting on.

41. What is a livelock?

A livelock is a situation in which two or more threads block each other by responding to an action that is caused by another thread. In contrast to a deadlock situation, where two or more threads wait in one specific state, the threads that participate in a livelock change their state in a way that prevents progress on their regular work. An example would be a situation in which two threads try to acquire two locks, but release a lock they have acquired when they cannot acquire the second lock. It may now happen that both threads concurrently try to acquire the first thread. As only one thread succeeds, the second thread may succeed in acquiring the second lock. Now both threads hold two different locks, but as both want to have both locks, they release their lock and try again from the beginning. This situation may now happen again and again.

42. What do we understand by thread starvation?

Threads with lower priority get less time for execution than threads with higher priority. When the threads with lower priority performs a long enduring computations, it may happen that these threads do not get enough time to finish their computations just in time. They seem to “starve” away as threads with higher priority steal them their computation time.

43. Can a synchronized block cause thread starvation?

The order in which threads can enter a synchronized block is not defined. So in theory it may happen that in case many threads are waiting for the entrance to a synchronized block, some threads have to wait longer than other threads. Hence they do not get enough computation time to finish their work in time.

44. What do we understand by the term race condition?

A race condition describes constellations in which the outcome of some multi-threaded implementation depends on the exact timing behavior of the participating threads. In most cases it is not desirable to have such a kind of behavior, hence the term race condition also means that a bug due to missing thread synchronization leads to the differing outcome. A simple example for a race condition is the incrementation of an integer variable by two concurrent threads. As the operation consists of more than one single and atomic operation, it may happen that both threads read and increment the same value. After this concurrent incrementation the amount of the integer variable is not increased by two but only by one.

45. What do we understand by fair locks?

A fair lock takes the waiting time of the threads into account when choosing the next thread that passes the barrier to some exclusive resource. An example implementation of a fair lock is provided by the Java SDK: java.util.concurrent.locks.ReentrantLock. If the constructor with the boolean flag set to true is used, the ReentrantLock grants access to the longest-waiting thread.

46. Which two methods that each object inherits from java.lang.Object can be used to implement a simple producer/consumer scenario?

When a worker thread has finished its current task and the queue for new tasks is empty, it can free the processor by acquiring an intrinsic lock on the queue object and by calling the method wait(). The thread will be woken up by some producer thread that has put a new task into the queue and that again acquires the same intrinsic lock on the queue object and calls notify() on it.

47. What is the difference between notify() and notifyAll()?

Both methods are used to wake up one or more threads that have put themselves to sleep by calling wait(). While notify() only wakes up one of the waiting threads, notifyAll() wakes up all waiting threads.

48. How it is determined which thread wakes up by calling notify()?

It is not specified which threads will be woken up by calling notify() if more than one thread is waiting. Hence code should not rely on any concrete JVM implementation.

49. Is the following code that retrieves an integer value from some queue implementation correct?

	public Integer getNextInt() {
		Integer retVal = null;
		synchronized (queue) {
			try {
				while (queue.isEmpty()) {
			} catch (InterruptedException e) {
		synchronized (queue) {
			retVal = queue.poll();
			if (retVal == null) {
				System.err.println("retVal is null");
				throw new IllegalStateException();
		return retVal;

Although the code above uses the queue as object monitor, it does not behave correctly in a multi-threaded environment. The reason for this is that it has two separate synchronized blocks. When two threads are woken up in line 6 by another thread that calls notifyAll(), both threads enter one after the other the second synchronized block. It this second block the queue has now only one new value, hence the second thread will poll on an empty queue and get null as return value.

50. Is it possible to check whether a thread holds a monitor lock on some given object?

The class java.lang.Thread provides the static method Thread.holdsLock(Object) that returns true if and only if the current thread holds the lock on the object given as argument to the method invocation.

51. What does the method Thread.yield() do?

An invocation of the static method Thread.yield() gives the scheduler a hint that the current thread is willing to free the processor. The scheduler is free to ignore this hint. As it is not defined which thread will get the processor after the invocation of Thread.yield(), it may even happen that the current thread becomes the “next” thread to be executed.

52. What do you have to consider when passing object instances from one thread to another?

When passing objects between threads, you will have to pay attention that these objects are not manipulated by two threads at the same time. An example would be a Map implementation whose key/value pairs are modified by two concurrent threads. In order to avoid problems with concurrent modifications you can design an object to be immutable.

53. Which rules do you have to follow in order to implement an immutable class?

  • All fields should be final and private.
  • There should be not setter methods.
  • The class itself should be declared final in order to prevent subclasses to violate the principle of immutability.
  • If fields are not of a primitive type but a reference to another object:
    • There should not be a getter method that exposes the reference directly to the caller.
    • Don’t change the referenced objects (or at least changing these references is not visisble to clients of the object).

54. What is the purpose of the class java.lang.ThreadLocal?

As memory is shared between different threads, ThreadLocal provides a way to store and retrieve values for each thread separately. Implementations of ThreadLocal store and retrieve the values for each thread independently such that when thread A stores the value A1 and thread B stores the value B1 in the same instance of ThreadLocal, thread A later on retrieves value A1 from this ThreadLocal instance and thread B retrieves value B1.

55. What are possible use cases for java.lang.ThreadLocal?

Instances of ThreadLocal can be used to transport information throughout the application without the need to pass this from method to method. Examples would be the transportation of security/login information within an instance of ThreadLocal such that it is accessible by each method. Another use case would be to transport transaction information or in general objects that should be accessible in all methods without passing them from method to method.

56. Is it possible to improve the performance of an application by the usage of multi-threading? Name some examples.

If we have more than one CPU core available, the performance of an application can be improved by multi-threading if it is possible to parallelize the computations over the available CPU cores. An example would be an application that should scale all images that are stored within a local directory structure. Instead of iterating over all images one after the other, a producer/consumer implementation can use a single thread to scan the directory structure and a bunch of worker threads that perform the actual scaling operation. Another example would be an application that mirrors some web page. Instead of loading one HTML page after the other, a producer thread can parse the first HTML page and issue the links it found into a queue. The worker threads monitor the queue and load the web pages found by the parser. While the worker threads wait for the page to get loaded completely, other threads can use the CPU to parse the already loaded pages and issue new requests.

57. What do we understand by the term scalability?

Scalability means the ability of a program to improve the performance by adding further resources to it.

58. Is it possible to compute the theoretical maximum speed up for an application by using multiple processors?

Amdahl’s law provides a formula to compute the theoretical maximum speed up by providing multiple processors to an application. The theoretical speedup is computed by S(n) = 1 / (B + (1-B)/n) where n denotes the number of processors and B the fraction of the program that cannot be executed in parallel. When n converges against infinity, the term (1-B)/n converges against zero. Hence the formula can be reduced in this special case to 1/B. As we can see, the theoretical maximum speedup behaves reciprocal to the fraction that has to be executed serially. This means the lower this fraction is, the more theoretical speedup can be achieved.

59. What do we understand by lock contention?

Lock contention occurs, when two or more threads are competing in the acquisition of a lock. The scheduler has to decide whether it lets the thread, which has to wait sleeping and performs a context switch to let another thread occupy the CPU, or if letting the waiting thread busy-waiting is more efficient. Both ways introduce idle time to the inferior thread.

60. Which techniques help to reduce lock contention?

In some cases lock contention can be reduced by applying one of the following techniques:

  • The scope of the lock is reduced.
  • The number of times a certain lock is acquired is reduced (lock splitting).
  • Using hardware supported optimistic locking operations instead of synchronization.
  • Avoid synchronization where possible.
  • Avoid object pooling.

61. Which technique to reduce lock contention can be applied to the following code?

	synchronized (map) {
		UUID randomUUID = UUID.randomUUID();
		Integer value = Integer.valueOf(42);
		String key = randomUUID.toString();
		map.put(key, value);

The code above performs the computation of the random UUID and the conversion of the literal 42 into an Integer object within the synchronized block, although these two lines of code are local to the current thread and do not affect other threads. Hence they can be moved out of the synchronized block:

	UUID randomUUID = UUID.randomUUID();
	Integer value = Integer.valueOf(42);
	String key = randomUUID.toString();
	synchronized (map) {
		map.put(key, value);

62. Explain by an example the technique lock splitting.

Lock splitting may be a way to reduce lock contention when one lock is used to synchronize access to different aspects of the same application. Suppose we have a class that implements the computation of some statistical data of our application. A first version of this class uses the keyword synchronized in each method signature in order to guard the internal state before corruption by multiple concurrent threads. This also means that each method invocation may cause lock contention as other threads may try to acquire the same lock simultaneously. But it may be possible to split the lock on the object instance into a few smaller locks for each type of statistical data within each method. Hence thread T1 that tries to increment the statistical data D1 does not have to wait for the lock while thread T2 simultaneously updates the data D2.

63. What kind of technique for reducing lock contention is used by the SDK class ReadWriteLock?

The SDK class ReadWriteLock uses the fact that concurrent threads do not have to acquire a lock when they want to read a value when no other thread tries to update the value. This is implemented by a pair of locks, one for read-only operations and one for writing operations. While the read-only lock may be obtained by more than one thread, the implementation guarantees that all read operation see an updated value once the write lock is released.

64. What do we understand by lock striping?

In contrast to lock splitting, where we introduce different locks for different aspects of the application, lock striping uses multiple locks to guard different parts of the same data structure. An example for this technique is the class ConcurrentHashMap from JDK’s java.util.concurrent package. The Map implementation uses internally different buckets to store its values. The bucket is chosen by the value’s key. ConcurrentHashMap now uses different locks to guard different hash buckets. Hence one thread that tries to access the first hash bucket can acquire the lock for this bucket, while another thread can simultaneously access a second bucket. In contrast to a synchronized version of HashMap this technique can increase the performance when different threads work on different buckets.

65. What do we understand by a CAS operation?

CAS stands for compare-and-swap and means that the processor provides a separate instruction that updates the value of a register only if the provided value is equal to the current value. CAS operations can be used to avoid synchronization as the thread can try to update a value by providing its current value and the new value to the CAS operation. If another thread has meanwhile updated the value, the thread’s value is not equal to the current value and the update operation fails. The thread then reads the new value and tries again. That way the necessary synchronization is interchanged by an optimistic spin waiting.

66. Which Java classes use the CAS operation?

The SDK classes in the package java.util.concurrent.atomic like AtomicInteger or AtomicBoolean use internally the CAS operation to implement concurrent incrementation.

	public class CounterAtomic {
		private AtomicLong counter = new AtomicLong();

		public void increment() {

		public long get() {
			return counter.get();

67. Provide an example why performance improvements for single-threaded applications can cause performance degradation for multi-threaded applications.

A prominent example for such optimizations is a List implementation that holds the number of elements as a separate variable. This improves the performance for single-threaded applications as the size() operation does not have to iterate over all elements but can return the current number of elements directly. Within a multi-threaded application the additional counter has to be guarded by a lock as multiple concurrent threads may insert elements into the list. This additional lock can cost performance when there are more updates to the list than invocations of the size() operation.

68. Is object pooling always a performance improvement for multi-threaded applications?

Object pools that try to avoid the construction of new objects by pooling them can improve the performance of single-threaded applications as the cost for object creation is interchanged by requesting a new object from the pool. In multi-threaded applications such an object pool has to have synchronized access to the pool and the additional costs of lock contention may outweigh the saved costs of the additional construction and garbage collection of the new objects. Hence object pooling may not always improve the overall performance of a multi-threaded application.

69. What is the relation between the two interfaces Executor and ExecutorService?

The interface Executor only defines one method: execute(Runnable). Implementations of this interface will have to execute the given Runnable instance at some time in the future. The ExecutorService interface is an extension of the Executor interface and provides additional methods to shut down the underlying implementation, to await the termination of all submitted tasks and it allows submitting instances of Callable.

70. What happens when you submit() a new task to an ExecutorService instance whose queue is already full?

As the method signature of submit() indicates, the ExecutorService implementation is supposed to throw a RejectedExecutionException.

71. What is a ScheduledExecutorService?

The interface ScheduledExecutorService extends the interface ExecutorService and adds method that allow to submit new tasks to the underlying implementation that should be executed a given point in time. There are two methods to schedule one-shot tasks and two methods to create and execute periodic tasks.

72. Do you know an easy way to construct a thread pool with 5 threads that executes some tasks that return a value?

The SDK provides a factory and utility class Executors whose static method newFixedThreadPool(int nThreads) allows the creation of a thread pool with a fixed number of threads (the implementation of MyCallable is omitted):

	public static void main(String[] args) throws InterruptedException, ExecutionException {
		ExecutorService executorService = Executors.newFixedThreadPool(5);
		Future[] futures = new Future[5];
		for (int i = 0; i < futures.length; i++) {
			futures[i] = executorService.submit(new MyCallable());
		for (int i = 0; i < futures.length; i++) {
			Integer retVal = futures[i].get();

73. What is the difference between the two interfaces Runnable and Callable?

The interface Runnable defines the method run() without any return value whereas the interface Callable allows the method run() to return a value and to throw an exception.

74. Which are use cases for the class java.util.concurrent.Future?

Instances of the class java.util.concurrent.Future are used to represent results of asynchronous computations whose result are not immediately available. Hence the class provides methods to check if the asynchronous computation has finished, canceling the task and to the retrieve the actual result. The latter can be done with the two get() methods provided. The first get() methods takes no parameter and blocks until the result is available whereas the second get() method takes a timeout parameter that lets the method invocation return if the result does not get available within the given timeframe.

75. What is the difference between HashMap and Hashtable particularly with regard to thread-safety?

The methods of Hashtable are all synchronized. This is not the case for the HashMap implementation. Hence Hashtable is thread-safe whereas HashMap is not thread-safe. For single-threaded applications it is therefore more efficient to use the “newer” HashMap implementation.

76. Is there a simple way to create a synchronized instance of an arbitrary implementation of Collection, List or Map?

The utility class Collections provides the methods synchronizedCollection(Collection), synchronizedList(List) and synchronizedMap(Map) that return a thread-safe collection/list/map that is backed by the given instance.

77. What is a semaphore?

A semaphore is a data structure that maintains a set of permits that have to be acquired by competing threads. Semaphores can therefore be used to control how many threads access a critical section or resource simultaneously. Hence the constructor of java.util.concurrent.Semaphore takes as first parameter the number of permits the threads compete about. Each invocation of its acquire() methods tries to obtain one of the available permits. The method acquire() without any parameter blocks until the next permit gets available. Later on, when the thread has finished its work on the critical resource, it can release the permit by invoking the method release() on an instance of Semaphore.

78. What is a CountDownLatch?

The SDK class CountDownLatch provides a synchronization aid that can be used to implement scenarios in which threads have to wait until some other threads have reached the same state such that all thread can start. This is done by providing a synchronized counter that is decremented until it reaches the value zero. Having reached zero the CountDownLatch instance lets all threads proceed. This can be either used to let all threads start at a given point in time by using the value 1 for the counter or to wait until a number of threads has finished. In the latter case the counter is initialized with the number of threads and each thread that has finished its work counts the latch down by one.

79. What is the difference between a CountDownLatch and a CyclicBarrier?

Both SDK classes maintain internally a counter that is decremented by different threads. The threads wait until the internal counter reaches the value zero and proceed from there on. But in contrast to the CountDownLatch the class CyclicBarrier resets the internal value back to the initial value once the value reaches zero. As the name indicates instances of CyclicBarrier can therefore be used to implement use cases where threads have to wait on each other again and again.

80. What kind of tasks can be solved by using the Fork/Join framework?

The base class of the Fork/Join Framework java.util.concurrent.ForkJoinPool is basically a thread pool that executes instances of java.util.concurrent.ForkJoinTask. The class ForkJoinTask provides the two methods fork() and join(). While fork() is used to start the asynchronous execution of the task, the method join() is used to await the result of the computation. Hence the Fork/Join framework can be used to implement divide-and-conquer algorithms where a more complex problem is divided into a number of smaller and easier to solve problems.

81. Is it possible to find the smallest number within an array of numbers using the Fork/Join-Framework?

The problem of finding the smallest number within an array of numbers can be solved by using a divide-and-conquer algorithm. The smallest problem that can be solved very easily is an array of two numbers as we can determine the smaller of the two numbers directly by one comparison. Using a divide-and-conquer approach the initial array is divided into two parts of equal length and both parts are provided to two instances of RecursiveTask that extend the class ForkJoinTask. By forking the two tasks they get executed and either solve the problem directly, if their slice of the array has the length two, or they again recursively divide the array into two parts and fork two new RecursiveTasks. Finally each task instance returns its result (either by having it computed directly or by waiting for the two subtasks). The root tasks then returns the smallest number in the array.

82. What is the difference between the two classes RecursiveTask and RecursiveAction?

In contrast to RecursiveTask the method compute() of RecursiveAction does not have to return a value. Hence RecursiveAction can be used when the action works directly on some data structure without having to return the computed value.

83. Is it possible to perform stream operations in Java 8 with a thread pool?

Collections provide the method parallelStream() to create a stream that is processed by a thread pool. Alternatively you can call the intermediate method parallel() on a given stream to convert a sequential stream to a parallel counterpart.

84. How can we access the thread pool that is used by parallel stream operations?

The thread pool used for parallel stream operations can be accessed by ForkJoinPool.commonPool(). This way we can query its level of parallelism with commonPool.getParallelism(). The level cannot be changed at runtime but it can be configured by providing the following JVM parameter: -Djava.util.concurrent.ForkJoinPool.common.parallelism=5.



Streamdrill compared to other approaches for the Top-K-Problem

Streamdrill solves the top-k problem in real-time which consists in counting activities of different event times over a certain time interval.

So far, so good, but you might wonder why you couldn’t just do this by yourself. After all, it’s justcounting, right? Actually, the problem that streamdrill solves is simple, as long as:

  • the data volume is small
  • the time windows are small compared to the data volume
  • the number different kinds of events is small and bounded
  • you don’t need the results in real-time

But as soon as you have millions of events per day, wish to aggregate over days, weeks, or even months, and potentially have several million of different types of events, the problem gets quite complicated.

You may end up in such a situation faster than you think:

  • Event rates may rise up beyond what you originally envisioned (in particular if you’re successful. 😉)
  • The number of different event types may explode. This might either be because the underlying sets are already large (say, IP addresses, or users in a social network), or because you are tracking combinations such that the sizes multiply.

Let’s dig deeper into this problem to better understand why this problem quickly gets hard.

One thing you need to keep in mind is that the other solutions we will discuss still involve some amount of coding. So if you compare streamdrill against Hadoop, you would need to do a non-trivial amount of coding for Hadoop, because Hadoop is a general purpose framework taking care of the scaling, but doesn’t solve the top-k problem out of the box.

streamdrill Standard SQL Counters Stream Processing
solves top-k out of the box
real-time (✓ $$$)
focusses computation on “hot set”
memory based for high throughput
persistent (✓ ⌚) (✓)
scales to cluster (✓ ⌚)
exact results (✓ ⌚)

✓ = yes, ✕ = no, (✓) = possible, (✓ $$$) = possible, but expensive, (✓ ⌚) = possible, but not yet.

Approach 1: Store and crunch later

Let’s start by using a traditional approach based on some SQL database. To do that, you would create a table with a timestamp column and columns to reflect the fields in your data. Then you would just pipe in each event into the database. To get the counts you would run something like SELECT count(*) FROM events WHERE timestamp > '2012-12-01 00:00' AND timestamp < '2012-12-31 23:59' potentially also grouping to focus on certain types of events, and adding an ORDER BY count(*) clause to get the most active elements.

There a number of problems with this approach:

  • If you use a normal disk based database, you will be able to add only a few hundred, at best thousands, events per second,
  • As you add more data, your database will become slower over time (and you will be adding a lot of data!). Leo has a number of nice benchmarks on his blog for MongoDB and Cassandra insertion performance.
  • Just adding the data is not enough, you also need to crunch the whole data to compute the activities. But the longer the time window, the longer will the query take to run.
  • While the query runs, there will be considerable load on your server, making the addition of events even slower.
  • Eventually, you will get so slow that your results will already be a few minutes or even hours old once you get them. Hardly real-time.

What’s more, you’re probably only interested in the top 100 active elements, so most of the computation is spent on data you’re not interested in.

In other words, putting the data into some big database and crunching it later won’t really scale. If you’ve got a lot of money on the side, you can employ some form of clustering using map reduce or a similar approach to bring down the times, but the bottom line is the same: You crunch a lot of data which you don’t really need, and the problem will only become harder if you get more and more data. And “harder” also means a lot of coding and operations work (which you don’t have if you just use streamdrill 😉).

Approach 2: Just keeping the counters

So just storing the data away and crunching it later won’t work. So how about doing the counting on the spot? That way, you wouldn’t have to store all those duplicate events. Note that just keeping the counters isn’t sufficient, you also need to maintain the global index such that you can quickly identify the top-k entries.

Using a complex event processing framework like Esper would also be a way to reduce the coding load, as Esper comes with a nice query language which let’s you formulate averages over time windows in compact way.

Let’s assume your data doesn’t fit into memory anymore (otherwise it won’t be Big Data, right?). One option is to again store the counters in a database. However, just as in the previous example this restricts the number of updates you can handle. Also, you will generate a lot of changes on the database and not all databases handle that amount of write throughput gracefully. For example, Cassandra only marks old entries for deletion and cleans up during the compaction phases. In our experience, such compactions will eventually take hours and put significant load on your system, cutting the throughput in half.

And again, most of the time is spent on elements which will never make the top-k entries.

Approach 3: Stream processing frameworks

Instead of keeping counters in a database, you could also try and scale out using a stream processing framework like Twitter’s Storm, or Yahoo’s S4. Such systems let you define the computation tasks in the form of small worker threads which are then distributed over a cluster automatically by the framework, also keeping the counters in memory.

While this looks appealing (and in fact, allows you to scale to several hundred thousand events per second), note that this only solves the counting part, but not the global index of all activities. Computing that in a way which scales is non-trivial. You can collect the counter updates at a worker thread which then maintains the index, but what if it doesn’t fit into memory? You could partition the index, but then you’d have to aggregate the data to compute queries, and you’d have to do this yourself, so again, a lot of complexity. The above stream processing frameworks also don’t come with easy support for query, so you’d need to build some infrastructure to collect the results yourself.

And again, you also have a lot of computation for elements which will never show up in your top 100.

In summary

While conceptually simple (in the end, you just count, right), the top-k problem which streamdrill addresses becomes hard if there are more things to count than fit into memory, and the event rate is higher than what you can write to disk.

Finally, let’s discuss some of the differences:

  • As streamdrill is memory based, all the data is lost when streamdrill crashes or is restarted. However, we already have functionality in the engine to write snapshots of all the data, but those aren’t available yet via the API streamdrill.
  • Right now, streamdrill does not support clustering. We just haven’t found it necessary so far, but it’s something that is possible and will be included soon.
  • Finally, as I’m going to explain in more depth in the next post, streamdrill is based on approximate algorithms which trade exactness versus performance. Again, if exactness is really an issue, you can get it by combining with one of the other technologies. This is possible, but not our top priority for now.



Hazelcast Overview

Hazelcast is an open source In-Memory Data Grid (IMDG). As such it provides elastically scalable distributed In-Memory computing, widely recognized as the fastest and most scalable approach to application performance, and Hazelcast does so in open source. More importantly it makes distributed computing simple by offering distributed implementations of developer friendly interfaces from Java such as Map, Queue, ExecutorService, Lock, JCache and many more. For example, the Map interface provides an In-Memory Key Value store which confers many of the advantages of NoSQL in terms of developer friendliness and developer productivity.

In addition to distributing data In-Memory, Hazelcast provides a convenient set of APIs to access the CPUs in your cluster for maximum processing speed. Hazelcast is designed to be lightweight and easy to use. Since Hazelcast is delivered as a compact library (JAR) and has no external dependencies other than Java, it is easily pluggable into your software solution to provide distributed data structures and distributed computing utilities.

Hazelcast is highly scalable and available. Distributed applications can use Hazelcast for distributed caching, synchronization, clustering, processing, pub/sub messaging, etc. Hazelcast is implemented in Java and has clients for Java, C/C++, .NET as well as REST. Hazelcast can also speak memcache protocol. It also plugs in to Hibernate and can easily be used with any existing database system.

If you are looking for In-Memory speed, elastic scalability and the developer friendliness of NoSQL, Hazelcast is a great choice for you.

Hazelcast is simple

Hazelcast is written in Java with no other dependencies. It exposes the same API from the familiar Java util package. Just add hazelcast.jar to your classpath, enjoy JVMs clustering in less than a minute and start building scalable applications.

Hazelcast is Peer-to-Peer

Unlike many NoSQL solutions, Hazelcast is peer-to-peer. There is no master and slave; there is no single point of failure. All nodes store equal amount of data and do equal amount of processing. Hazelcast can be embedded to your existing application or used in client and server mode where your application is client to the Hazelcast nodes.

Hazelcast is scalable

Hazelcast is designed to scale up to hundreds and thousands of nodes. Simply add new nodes and they will automatically discover the cluster and will linearly increase both memory and processing capacity. The nodes maintain a TCP connection between each other and all communication is performed through this layer.

Hazelcast is fast

Hazelcast stores everything in-memory. It is designed to perform very fast reads and updates.

Hazelcast is redundant

Hazelcast keeps the backup of each data entry on multiple nodes. On a node failure, the data is restored from the backup and cluster will continue to operate without a downtime.

Sharding in Hazelcast

Hazelcast shards are called Partitions. By default, Hazelcast has 271 partitions. Given a key; we serialize, hash and mode it with the number of partitions to find the partition it belongs to. The partitions themselves are distributed equally among the members of the cluster. Hazelcast also creates the backups of partitions and also distributes them among nodes for redundancy.

Partitions in a 1 node Hazelcast cluster.

Partitions in a 2 node cluster.

The blacks are primary partitions and reds are backups. In the above illustration, first node has 135 primary partitions (black) and each of these partitions are backed up in the second node (red). At the same time, first node has the backup partitions of second node’s primary partitions.

As you add more nodes, Hazelcast will move one by one some of the primary and backup partitions to new nodes to make all nodes equal and redundant. Only minimum amount of partitions will be moved to scale out Hazelcast.

Hazelcast Topology

If you have an application whose main focal point is asynchronous or high performance computing and lots of task executions, then embedded deployment is the most useful. In this type, nodes include both the application and data, see the below illustration.

You can have a cluster of server nodes that can be independently created and scaled. Your clients communicate with these server nodes to reach to the data on them. Hazelcast provides native clients (Java, .NET and C++), Memcache clients and REST clients. See the below illustration.


Hazelcast MultiMap is a specialized map where you can store multiple values under a single key. Just like any other distributed data structure implementation in Hazelcast, MultiMap is distributed and thread-safe.

Hazelcast MultiMap is not an implementation of java.util.Map due to the difference in method signatures. It supports most features of Hazelcast Map except for indexing, predicates and MapLoader/MapStore. Yet, like Hazelcast Map, entries are almost evenly distributed onto all cluster members. When a new member joins the cluster, the same ownership logic used in the distributed map applies.

Sample MultiMap Code

Let’s write code that puts data into a MultiMap.

public class PutMember {
  public static void main( String[] args ) {
    HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
    MultiMap <String , String > map = hazelcastInstance.getMultiMap( "map" );

    map.put( "a", "1" );
    map.put( "a", "2" );
    map.put( "b", "3" ); 
    System.out.println( "PutMember:Done" );

Now let’s print the entries in this MultiMap.

public class PrintMember {
  public static void main( String[] args ) { 
    HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
    MultiMap <String, String > map = hazelcastInstance.getMultiMap( "map" );
    for ( String key : map.keySet() ){
      Collection <String > values = map.get( key );
      System.out.println( "%s -> %s\n",key, values );

After you run the first code sample, run the PrintMember sample. You will see the key a has two values, as shown below.

b -> [3]

a -> [2, 1]

Configuring MultiMap

When using MultiMap, the collection type of the values can be either Set or List. You configure the collection type with the valueCollectionType parameter. If you choose Set, duplicate and null values are not allowed in your collection and ordering is irrelevant. If you choose List, ordering is relevant and your collection can include duplicate and null values.

You can also enable statistics for your MultiMap with the statisticsEnabled parameter. If you enable statisticsEnabled, statistics can be retrieved with getLocalMultiMapStats() method.


Please refer to the MultiMap Configuration section for a full description of Hazelcast Distributed MultiMap configuration.

Core concepts


A NordicAPIs infographic

rest vs soap


origin REST


REST (Representational State Transfer) was Created in 2000 by Roy Fielding in UC, Irvine. Developed in an academic environment, this protocol embraces the philosophy of the open Web.

origin SOAP


SOAP (Simple Object Access Protocol), was created in 1998 by Dave Winer et al in collaboration with Microsoft. Developed by a large software company, this protocol addresses the goal of addressing the needs of the enterprise market.

concept_titleBASIC CONCEPT



Makes data available as resources (nouns), for example “user” or “invoice””



Makes data available as services (verb + noun), for example “getUser” or “PayInvoice”


pros REST


  • Follows the philosophy of the Open Web
  • Relatively easy to implement and maintain
  • Clearly separates client and server implementations
  • Communication isn’t controlled by a single entity
  • Information can be stored by the client to prevent multiple calls
  • Can return data in multiple formats (JSON, XML etc)
pros SOAP


  • Follows a formal enterprise approach
  • Works on top of any communication protocol, even asynchronously
  • Information about objects is communicated to clients
  • Security and authorization are part of the protocol
  • Can be fully described using WSDL


cons REST


  • only works on top of the HTTP protocol
  • Hard to enforce authorization and security on top of it
cons SOAP


  • Spends a lot of bandwidth communicating metadata
  • Hard to implement and is unpopular among Web and mobile developers
  • Uses only XML

cons_titleWHEN TO USE



  • When clients and servers operate on a Web environment
  • When information about objects doesn’t need to be communicated to the client


  • When clients need to have access to objects available on servers
  • When you want to enforce a formal contract between client and server

SOAP vs REST when not to useWHEN NOT TO USE

when not to use REST


  • When you need to enforce a strict contract between client and server
  • When performing transactions that involve multiple calls
when not to use SOAP


  • When you want the majority of developers to easily use your API
  • When your bandwidth is very limited


REST use cases


  • Social Media services
  • Social Networks
  • Web Chat services
  • Mobile Services
SOAP use cases


  • Financial services
  • Payment gateways
  • Telecommunication services



Here are a few reasons you may want to use SOAP.


While SOAP supports SSL (just like REST) it also supports WS-Security which adds some enterprise security features. Supports identity through intermediaries, not just point to point (SSL). It also provides a standard implementation of data integrity and data privacy. Calling it “Enterprise” isn’t to say it’s more secure, it simply supports some security tools that typical internet services have no need for, in fact they are really only needed in a few “enterprise” scenarios.


Need ACID Transactions over a service, you’re going to need SOAP. While REST supports transactions, it isn’t as comprehensive and isn’t ACID compliant. Fortunately ACID transactions almost never make sense over the internet. REST is limited by HTTP itself which can’t provide two-phase commit across distributed transactional resources, but SOAP can. Internet apps generally don’t need this level of transactional reliability, enterprise apps sometimes do.


Rest doesn’t have a standard messaging system and expects clients to deal with communication failures by retrying. SOAP has successful/retry logic built in and provides end-to-end reliability even through SOAP intermediaries.

WS-AtomicTransaction Explained

The following figure shows two instances of WebLogic Server interacting within the context of a Web services atomic transaction. For simplicity, two WebLogic Web service applications are shown.

Figure 3-2 Web Services Atomic Transactions in WebLogic Server Environment

Description of Figure 3-2 follows
Description of “Figure 3-2 Web Services Atomic Transactions in WebLogic Server Environment”

Please note the following:

  • Using the local JTA transaction manager, a transaction can be imported to or exported from the local JTA environment as a subordinate transaction, all within the context of a Web service request.
  • Creation and management of the coordination context is handled by the local JTA transaction manager.
  • All transaction integrity management and recovery processing is done by the local JTA transaction manager.

For more information about JTA, see Programming JTA for Oracle WebLogic Server.

The following describes a sample end-to-end Web services atomic transaction interaction, illustrated in Figure 3-2:

  1. Application A begins a transaction on the current thread of control using the JTA transaction manager on Server A.
  2. Application A calls a Web service method in Application B on Server B.
  3. Server A updates its transaction information and creates a SOAP header that contains the coordination context, and identifies the transaction and local coordinator.
  4. Server B receives the request for Application B, detects that the header contains a transaction coordination context and determines whether it has already registered as a participant in this transaction. If it has, that transaction is resumed and if not, a new transaction is started.

    Application B executes within the context of the imported transaction. All transactional resources with which the application interacts are enlisted with this imported transaction.

  5. Server B enlists itself as a participant in the WS-AtomicTransaction transaction by registering with the registration service indicated in the transaction coordination context.
  6. Server A resumes the transaction.
  7. Application A resumes processing and commits the transaction.


Spring Framework Overview

Spring was built on top of the idea of dependency injection and inversion of control. In normal words – instead of having a bunch of classes creating each other and passing each other from one place to another you have a bag of beans. Each bean declares its dependencies (what services do I need to work?) and Spring container resolves this requirements by automatically and automagically wiring everything together.

You have a Service that says (through XML, annotations, constructor signature…) I need DAO interface to work! and Spring is kind enough to find some bean that implements that interface, create it first and pass where it is required.

On that foundation multiple other services were provided (mostly in terms of data access and AOP), but the injection is the core concept.

Spring Framework – Dependency Injection

Any application is composed of many objects that collaborate with each other to perform some useful stuff. Traditionally each object is responsible for obtaining its own references to the dependent objects (dependencies) it collaborate with. This leads to highly coupled classes and hard-to-test code.

For example,consider a Car object.

A Car depends on Wheels, Engine, Fuel, Battery, etc to run. Traditionally we define the brand of such dependent objects along with the definition of the Car object.

Without Dependency Injection (DI) :

class Car{
  private Wheel wh= new NepaliRubberWheel();
  private Battery bt= new ExcideBattery();

Here, the Car object is responsible for creating the dependent objects.

What if we want to change the type of its dependent object – say Wheel – after the initial NepaliRubberWheel() punctures? We need to recreate the Car object with its new dependency sayChineseRubberWheel(), but only the Car manufacturer can do that.

Then what the Dependency Injection does us for …?

When using Dependency Injection, objects are given their dependencies at run time rather than compile time (car manufacturing time). So that we can now change the Wheel whenever we want. Here, the Dependency (Wheel) can be injected into Car at run time.

After using DI:

class Car{
  private Wheel wh= [Inject an Instance of Wheel at runtime]
  private Battery bt= [Inject an Instance of Battery at runtime]

Source: understanding-dependency-injection

Spring Framework – Aspect Oriented Programming (AOP)

Laymans terms so let me give you an example. Let’s say you have a web application, and you need to add error logging / auditing. One implementation would be to go into every public method and add your try catch blocks etc…

Well Aspect oriented says hogwash with that, let me inject my method around your method so for example instead of calling YourClass.UpdateModel(), the system might call,

LoggingHandler.CallMethod() this method might then redirect the call to UpdateModel but wraps it in a try catch block to handle logging errors.

Now the trick is that this redirection happens automagically, through configuration or by applying attributes to methods.

This works for as you said cross cutting things which are very common programing elements that exist in every domain, such as: Logging, Auditing, Transaction Mgmt, Authorization.

The idea behind it is to remove all this common plumbing code out of your business / app tier so you can focus on solving the problem not worrying about logging this method call or that method call.

Scaling Real-time Apps on Cloud Foundry Using Node.js and Redis

Common applications being built on Node.js like social networking and chat require real-time scaling capabilities across multiple instances. Developers need to deal with sticky sessions, scale-up, scale-down, instance crash/restart, and more. Cloud Foundry PaaS provides a ready platform to achieve this quickly.

The following blog post will walk you through deploying and managing real-time applications on Cloud Foundry using common Node.js examples and Redis key-value store capabilities.

Chat App

The main objective here is to build a simple chat app while tackling the scale requirements. Specifically, we will be building a simple Express, and Redis-based Chat app that meets the following objectives:

  1. Chat server should run with multiple instances.
  2. The user login should be saved in a session.
    • User should be logged back in upon browser refresh
    • should get user information from the session before sending chat messages.
    • should only connect if user is already logged in.
  3. If the server instance that the user is connected to gets restarted, goes down or is scaled down while the user is chatting, the user should be reconnected to an available instance and recover the session.

Chat app’s Login page:

Chat app’s Chat page:

We will also cover:

  1. How to use and Sticky Sessions
  2. How to use Redis as a session store
  3. How to use Redis as a pubsub service
  4. How to use to get session info (like user info) from Express sessions
  5. How to configure client and server to properly reconnect after one or more server instances goes down (i.e. has been restarted, scaled down or crashed) & Sticky Sessions is one of the earliest and most popular Node.js modules to help build real-time apps like chat, social networking etc. (Note: SockJS is another popular library similar to

When you run such a server in a cloud that has a load-balancer, reverse proxy, routers etc., it has to be configured to work properly, especially when you scale the server to use multiple instances.

One of the constraints, SockJS and similar libraries have is that they need to continuously talk to the same instance of the server. They work perfectly well when there is only 1 instance of the server.

When you scale your app in a cloud environment, the load balancer (Nginx in the case of Cloud Foundry) will take over, and the requests will be sent to different instances causing to break.

For such situations, load balancers have a feature called ‘sticky sessions’, also known as ‘session affinity’. The idea is that if this property is set, all the requests following the first load-balanced request will go to the same server instance.

In Cloud Foundry, cookie-based sticky sessions are enabled for apps that set the cookie jsessionid. Note that jsessionid is the cookie name commonly used to track sessions in Java/Spring applications. Cloud Foundry is simply adopting it as the sticky session cookie for all frameworks.

To make work, the apps just need to set a cookie with the name jsessionid.

* Use cookieParser and session middleware together.
* By default Express/Connect app creates a cookie by name 'connect.sid'. But to scale app,
* make sure to use cookie name 'jsessionid' (instead of connect.sid) use Cloud Foundry's 'Sticky Session' feature.
* W/o this, won't work when you have more than 1 instance.
* If you are NOT running on Cloud Foundry, having cookie name 'jsessionid' doesn't hurt - it's just a cookie name.
app.use(express.session({store:sessionStore, key:'jsessionid', secret:'your secret here'}));

In the above diagram, when you open the app,

  1. Express sets a session cookie with name jsessionid.
  2. When connects, it uses that same cookie and hits the load balancer
  3. The load balancer always routes it to the same server that the cookie was set in.

Sending session info to

Let’s imagine that the user is logging in via Twitter or Facebook, or a regular login screen. We are storing this information in a session after the user has logged in.'/login', function (req, res) {
    //store user info in session after login.
    req.session.user = req.body.user;

Once the user has logged in, we connect via to allow chatting. However, doesn’t know who the user is and if he or she is actually logged in before sending chat messages to others.

That’s where the library comes in. It’s a very simple library that’s a wrapper around All it does is grab session information during the handshake and then pass it to’sconnection function.

//With just
io.sockets.on('connection', function (socket) {
    //do pubsub here

//But with, you'll get session info

 Use SessionSockets so that we can exchange (set/get) user data b/w sockets and http sessions
 Pass 'jsessionid' (custom) cookie name that we are using to make use of Sticky sessions.
var SessionSockets = require('');
var sessionSockets = new SessionSockets(io, sessionStore, cookieParser, 'jsessionid');

sessionSockets.on('connection', function (err, socket, session) {

    //get info from session
    var user = session.user;

    //Close socket if user is not logged in
    if (!user)

    //do pubsub
    socket.emit('chat', {user: user, msg: 'logged in'});

Redis as a session store

So far so good, but Express stores these sessions in MemoryStore (by default). MemoryStore is simply a Javascript object – it will be in memory as long as the server is up. If the server goes down, all the session information of all users will be lost!

We need a place to store this outside of our server, but it should also be very fast to retrieve. That’s where Redis as a session store come in.

Let’s configure our app to use Redis as a session store as below.

 Use Redis for Session Store. Redis will keep all Express sessions in it.
var redis = require('redis');
var RedisStore = require('connect-redis')(express);
var rClient = redis.createClient();
var sessionStore = new RedisStore({client:rClient});

  //And pass sessionStore to Express's 'session' middleware's 'store' value.
app.use(express.session({store: sessionStore, key: 'jsessionid', secret: 'your secret here'}));

With the above configuration, sessions will now be stored in Redis. Also, if one of the server instances goes down, the session will still be available for other instances to pick up. as pub-sub server

So far our sessions are taken care of with the above setup, but if we are using’s default pub-sub mechanism, it will work only for 1 sever instance. i.e. if user1 and user2 are on server instance #1, they can both chat with each other. If they are on different server instances, they cannot do so.

sessionSockets.on('connection', function (err, socket, session) {
    socket.on('chat', function (data) {
        socket.emit('chat', data); //send back to browser
        socket.broadcast.emit('chat', data); // send to others

    socket.on('join', function (data) {
        socket.emit('chat', {msg: 'user joined'});
        socket.broadcast.emit('chat', {msg: 'user joined'});

Redis as a PubSub service

In order to send chat messages to users across servers we will update our server to use Redis as a PubSub service (along with session store). Redis natively supports pub-sub operations. All we need to do is to create a publisher, a subscriber, and a channel.

//We will use Redis to do pub-sub

 Create two Redis connections. A 'pub' for publishing and a 'sub' for subscribing.
 Subscribe 'sub' connection to 'chat' channel.
var sub = redis.createClient();
var pub = redis.createClient();

sessionSockets.on('connection', function (err, socket, session) {
    socket.on('chat', function (data) {
        pub.publish('chat', data);

    socket.on('join', function (data) {
        pub.publish('chat', {msg: 'user joined'});

     Use Redis' 'sub' (subscriber) client to listen to any message from Redis to server.
     When a message arrives, send it back to browser using
    sub.on('message', function (channel, message) {
        socket.emit(channel, message);

The app architecture will now look like this:

Handling server scale-down / crashes / restarts

The app will work fine as long as all the server instances are running. What happens if the server is restarted or scaled down or one of the instances crash? How do we handle that?

Let’s first understand what happens in that situation.

The code below simply connects a browser to server and listens to various events.

  Connect to on the server (***BEFORE FIX***).
 var host =':')[0];
 var socket = io.connect('http://' + host);

 socket.on('connect', function () {
 socket.on('connecting', function () {
 socket.on('disconnect', function () {
 socket.on('connect_failed', function () {
 socket.on('error', function (err) {
     console.log('error: ' + err);
 socket.on('reconnect_failed', function () {
 socket.on('reconnect', function () {
     console.log('reconnected ');
 socket.on('reconnecting', function () {

While the user is chatting, if we restart the app on localhost or on a single host, attempts to reconnect multiple times (based on configuration) to see if it can connect. If the server comes up with in that time, it will reconnect. So we see the below logs:

If we restart the server (say using vmc restart redispubsub) and the user is chatting on the same app that’s running on Cloud Foundry AND with multiple instances, we will see the following log:

You can see that in the above logs, after the server comes back up, client (running in the browser) isn’t able to connect to server (running on Node.js in the server).

This is because, once the server is restarted on Cloud Foundry, instances are brought up as if they are brand-new server instances with different IP addresses and different ports and so jsessionid is no-longer valid. That in turn causes the load balancer to load balance’s reconnection requests (i.e. they are sent to different server instances) causing the server not to properly handshake and consequently to throw client not handshaken errors!

OK, let’s fix that reconnection issue

First, we will disable’s default “reconnect” feature, and then implement our own reconnection feature.

In our custom reconnection function, when the server goes down, we’ll make a dummy HTTP-get call to index.html every 4-5 seconds. If the call succeeds, we know that the (Express) server has already setjsessionid in the response. So, then we’ll call’s reconnect function. This time because jsessionid is set,’s handshake will succeed and the user can continue to chat without interruption.

 Connect to on the server (*** FIX ***).
var host =':')[0];

//Disable's default "reconnect" feature
var socket = io.connect('http://' + host, {reconnect: false, 'try multiple transports': false});
var intervalID;
var reconnectCount = 0;
socket.on('disconnect', function () {

    //Retry reconnecting every 4 seconds
    intervalID = setInterval(tryReconnect, 4000);

 Implement our own reconnection feature.
 When the server goes down we make a dummy HTTP-get call to index.html every 4-5 seconds.
 If the call succeeds, we know that (Express) server sets ***jsessionid*** , so only then we try reconnect.
var tryReconnect = function () {
    if (reconnectCount == 5) {
    console.log('Making a dummy http call to set jsessionid (before we do reconnect)');
        .success(function () {
            console.log("http request succeeded");
            //reconnect the socket AFTER we got jsessionid set
        }).error(function (err) {
            console.log("http request failed (probably server not up yet)");

In addition, since the jsessionid is invalidated by the load balancer, we can’t create a session with the same jsessionid or else the sticky session will be ignored by the load balancer. So on the server, when the dummy HTTP request comes in, we will regenerate the session to remove the old session and sessionid and ensure everything is afresh before we serve the response.

//Instead of..
exports.index = function (req, res) {
    res.render('index', { title: 'RedisPubSubApp', user: req.session.user});

//Use this..
exports.index = function (req, res) {
    //Save user from previous session (if it exists)
    var user = req.session.user;

    //Regenerate new session & store user from previous session (if it exists)
    req.session.regenerate(function (err) {
        req.session.user = user;
        res.render('index', { title: 'RedisPubSubApp', user: req.session.user});

Running / Testing it on Cloud Foundry

  • Clone the app to redispubsub folder
  • cd redispubsub
  • npm install and follow the below instructions to push the app to Cloud Foundry
> vmc push redispubsub
Instances> 4       <----- Run 4 instances of the server

1: node
2: other
Framework> node

1: node
2: node06
3: node08
4: other
Runtime> 3  <---- Choose Node.js 0.8v

1: 64M
2: 128M
3: 256M
4: 512M
Memory Limit> 64M

Creating redispubsub... OK

2: none
URL>  <--- URL of the app (choose something unique)

Updating redispubsub... OK

Create services for application?> y

1: blob 0.51
2: mongodb 2.0
3: mysql 5.1
4: postgresql 9.0
5: rabbitmq 2.4
6: redis 2.6
7: redis 2.4
8: redis 2.2
What kind?> 6 <----- Select & Add Redis v2.6 service

Name?> redis-e9771 <-- This is just a random name for Redis service

Creating service redis-e9771... OK
Binding redis-e9771 to redispubsub... OK
Create another service?> n

Bind other services to application?> n

Save configuration?> n

Uploading redispubsub... OK
Starting redispubsub... OK
Checking redispubsub... OK
  • Once the server is up, open up multiple browsers and go to <appname>
  • Start chatting

Test 1

  • Refresh the browser
  • You should automatically be logged in

Test 2

  • Open up JS debugger (in Chrome, do cmd + alt +j)
  • Restart the server by running vmc restart <appname>
  • Once the server restarts, should automatically reconnect
  • You should be able to chat after the reconnection

That’s it for this time. Stay tuned for my next blog where I will discuss how RabbitMQ can be leveraged in scalable apps built on Cloud Foundry. The content of this blog has also been covered in a video. Feel free to get in touch with us for questions on the material.

General Notes

  • Get the code right away – Github location:
  • Deploy right away – if you don’t already have a Cloud Foundry account, sign up for it here.
  • Check out Cloud Foundry getting started here and install the vmc Ruby command line tool to push apps.
  • To install the latest alpha or beta vmc tool run: sudo gem install vmc --pre.


Front end UI:

Scalability Best Practices: Lessons from eBay

At eBay, one of the primary architectural forces we contend with every day is scalability. It colors and drives every architectural and design decision we make. With hundreds of millions of users worldwide, over two billion page views a day, and petabytes of data in our systems, this is not a choice – it is a necessity.

In a scalable architecture, resource usage should increase linearly (or better) with load, where load may be measured in user traffic, data volume, etc. Where performance is about the resource usage associated with a single unit of work, scalability is about how resource usage changes as units of work grow in number or size. Said another way, scalability is the shape of the price-performance curve, as opposed to its value at one point in that curve.

There are many facets to scalability – transactional, operational, development effort. In this article, I will outline several of the key best practices we have learned over time to scale the transactional throughput of a web-based system. Most of these best practices will be familiar to you. Some may not. All come from the collective experience of the people who develop and operate the eBay site.

Best Practice #1: Partition by Function

Whether you call it SOA, functional decomposition, or simply good engineering, related pieces of functionality belong together, while unrelated pieces of functionality belong apart. Further, the more decoupled that unrelated functionality can be, the more flexibility you will have to scale them independently of one another.

At the code level, we all do this all the time. JAR files, packages, bundles, etc., are all mechanisms we use to isolate and abstract one set of functionality from another.

At the application tier, eBay segments different functions into separate application pools. Selling functionality is served by one set of application servers, bidding by another, search by yet another. In total, we organize our roughly 16,000 application servers into 220 different pools. This allows us to scale each pool independently of one another, according to the demands and resource consumption of its function. It further allows us to isolate and rationalize resource dependencies – the selling pool only needs to talk to a relatively small subset of backend resources, for example.

At the database tier, we follow much the same approach. There is no single monolithic database at eBay. Instead there is a set of database hosts for user data, a set for item data, a set for purchase data, etc. – 1000 logical databases in all, on 400 physical hosts. Again, this approach allows us to scale the database infrastructure for each type of data independently of the others.

Best Practice #2: Split Horizontally

While functional partitioning gets us part of the way, by itself it is not sufficient for a fully scalable architecture. As decoupled as one function may be from another, the demands of a single functional area can and will outgrow any single system over time. Or, as we like to remind ourselves, “if you can’t split it, you can’t scale it.” Within a particular functional area, then, we need to be able to break the workload down into manageable units, where each individual unit retains good price-performance. Here is where the horizontal split comes in.

At the application tier, where eBay’s interactions are by design stateless, splitting horizontally is trivial. Use a standard load-balancer to route incoming traffic. Because all application servers are created equal and none retains any transactional state, any of them will do. If we need more processing power, we simply add more application servers.

The more challenging problem arises at the database tier, since data is stateful by definition. Here we split (or “shard”) the data horizontally along its primary access path. User data, for example, is currently divided over 20 hosts, with each host containing 1/20 of the users. As our numbers of users grow, and as the data we store for each user grows, we add more hosts, and subdivide the users further. Again, we use the same approach for items, for purchases, for accounts, etc. Different use cases use different schemes for partitioning the data: some are based on a simple modulo of a key (item ids ending in 1 go to one host, those ending in 2 go to the next, etc.), some on a range of ids (0-1M, 1-2M, etc.), some on a lookup table, some on a combination of these strategies. Regardless of the details of the partitioning scheme, though, the general idea is that an infrastructure which supports partitioning and repartitioning of data will be far more scalable than one which does not.

Best Practice #3: Avoid Distributed Transactions

At this point, you may well be wondering how the practices of partitioning data functionally and horizontally jibe with transactional guarantees. After all, almost any interesting operation updates more than one type of entity – users and items come to mind immediately. The orthodox answer is well-known and well-understood – create a distributed transaction across the various resources, using two-phase commit to guarantee that all updates across all resources either occur or do not. Unfortunately, this pessimistic approach comes with substantial costs. Scaling, performance, and latency are adversely affected by the costs of coordination, which worsens geometrically as you increase the number of dependent resources and incoming clients. Availability is similarly limited by the requirement that all dependent resources are available. The pragmatic answer is to relax your transactional guarantees across unrelated systems.

It turns out that you can’t have everything. In particular, guaranteeing immediate consistency across multiple systems or partitions is typically neither required nor possible. The CAP theorem, postulated almost 10 years ago by Inktomi’s Eric Brewer, states that of three highly desirable properties of distributed systems – consistency (C), availability (A), and partition-tolerance (P) – you can only choose two at any one time. For a high-traffic web site, we have to choose partition-tolerance, since it is fundamental to scaling. For a 24×7 web site, we typically choose availability. So immediate consistency has to give way.

At eBay, we allow absolutely no client-side or distributed transactions of any kind – no two-phase commit. In certain well-defined situations, we will combine multiple statements on a single database into a single transactional operation. For the most part, however, individual statements are auto-committed. While this intentional relaxation of orthodox ACID properties does not guarantee immediate consistency everywhere, the reality is that most systems are available the vast majority of the time. Of course, we do employ various techniques to help the system reach eventual consistency: careful ordering of database operations, asynchronous recovery events, and reconciliation or settlement batches. We choose the technique according to the consistency demands of the particular use case.

The key takeaway here for architects and system designers is that consistency should not be viewed as an all or nothing proposition. Most real-world use cases simply do not require immediate consistency. Just as availability is not all or nothing, and we regularly trade it off against cost and other forces, similarly our job becomes tailoring the appropriate level of consistency guarantees to the requirements of a particular operation.

Best Practice #4: Decouple Functions Asynchronously

The next key element to scaling is the aggressive use of asynchrony. If component A calls component B synchronously, A and B are tightly coupled, and that coupled system has a single scalability characteristic — to scale A, you must also scale B. Equally problematic is its effect on availability. Going back to Logic 101, if A implies B, then not-B implies not-A. In other words, if B is down then A is down. By contrast, if A and B integrate asynchronously, whether through a queue, multicast messaging, a batch process, or some other means, each can be scaled independently of the other. Moreover, A and B now have independent availability characteristics – A can continue to move forward even if B is down or distressed.

This principle can and should be applied up and down an infrastructure. Techniques like SEDA (Staged Event-Driven Architecture) can be used for asynchrony inside an individual component while retaining an easy-to-understand programming model. Between components, the principle is the same — avoid synchronous coupling as much as possible. More often than not, the two components have no business talking directly to one another in any event. At every level, decomposing the processing into stages or phases, and connecting them up asynchronously, is critical to scaling.

Best Practice #5: Move Processing To Asynchronous Flows

Now that you have decoupled asynchronously, move as much processing as possible to the asynchronous side. In a system where replying rapidly to a request is critical, this can substantially reduce the latency experienced by the requestor. In a web site or trading system, it is worth it to trade off data or execution latency (how quickly we get everything done) for user latency (how quickly the user gets a response). Activity tracking, billing, settlement, and reporting are obvious examples of processing that belongs in the background. But often significant steps in processing of the primary use case can themselves be broken out to run asynchronously. Anything that can wait should wait.

Equally as important, but less often appreciated, is the fact that asynchrony can substantially reduce infrastructure cost. Performing operations synchronously forces you to scale your infrastructure for the peak load – it needs to handle the worst second of the worst day at that exact second. Moving expensive processing to asynchronous flows, though, allows you to scale your infrastructure for the average load instead of the peak. Instead of needing to process all requests immediately, the queue spreads the processing over time, and thereby dampens the peaks. The more spiky or variable the load on your system, the greater this advantage becomes.

Best Practice #6: Virtualize At All Levels

Virtualization and abstraction are everywhere, following the old computer science aphorism that the solution to every problem is another level of indirection. The operating system abstracts the hardware. The virtual machine in many modern languages abstracts the operating system. Object-relational mapping layers abstract the database. Load-balancers and virtual IPs abstract network endpoints. As we scale our infrastructure through partitioning by function and data, an additional level of virtualization of those partitions becomes critical.

At eBay, for example, we virtualize the database. Applications interact with a logical representation of a database, which is then mapped onto a particular physical machine and instance through configuration. Applications are similarly abstracted from the split routing logic, which assigns a particular record (say, that of user XYZ) to a particular partition. Both of these abstractions are implemented in our home-grown O/R layer. This allows the operations team to rebalance logical hosts between physical hosts, by separating them, consolidating them, or moving them — all without touching application code.

We similarly virtualize the search engine. To retrieve search results, an aggregator component parallelizes queries over multiple partitions, and makes a highly partitioned search grid appear to clients as one logical index.

The motivation here is not only programmer convenience, but also operational flexibility. Hardware and software systems fail, and requests need to be re-routed. Components, machines, and partitions are added, moved, and removed. With judicious use of virtualization, higher levels of your infrastructure are blissfully unaware of these changes, and you are therefore free to make them. Virtualization makes scaling the infrastructure possible because it makes scaling manageable.

Best Practice #7: Cache Appropriately

The last component of scaling is the judicious use of caching. The specific recommendations here are less universal, because they tend to be highly dependent on the details of the use case. At the end of the day, the goal of an efficient caching system to maximize your cache hit ratio within your storage constraints, your requirements for availability, and your tolerance for staleness. It turns out that this balance can be surprisingly difficult to strike. Once struck, our experience has shown that it is also quite likely to change over time.

The most obvious opportunities for caching come with slow-changing, read-mostly data – metadata, configuration, and static data, for example. At eBay, we cache this type of data aggressively, and use a combination of pull and push approaches to keep the system reasonably in sync in the face of updates. Reducing repeated requests for the same data can and does make a substantial impact. More challenging is rapidly-changing, read-write data. For the most part, we intentionally sidestep these challenges at eBay. We have traditionally not done any caching of transient session data between requests. We similarly do not cache shared business objects, like item or user data, in the application layer. We are explicitly trading off the potential benefits of caching this data against availability and correctness. It should be noted that other sites do take different approaches, make different tradeoffs, and are also successful.

Not surprisingly, it is quite possible to have too much of a good thing. The more memory you allocate for caching, the less you have available to service individual requests. In an application layer which is often memory-constrained, this is a very real tradeoff. More importantly, though, once you have come to rely on your cache, and have taken the extremely tempting steps of downsizing the primary systems to handle just the cache misses, your infrastructure literally may not be able to survive without it. Once your primary systems can no longer directly handle your load, your site’s availability now depends on 100% uptime of the cache – a potentially dangerous situation. Even something as routine as rebalancing, moving, or cold-starting the cache becomes problematic.

Done properly, a good caching system can bend your scaling curve below linear – subsequent requests retrieve data cheaply from cache rather than the relatively more expensive primary store. On the other hand, caching done poorly introduces substantial additional overhead and availability challenges. I have yet to see a system where there are not significant opportunities for caching. The key point, though, is to make sure your caching strategy is appropriate for your situation.


Scalability is sometimes called a “non-functional requirement,” implying that it is unrelated to functionality, and strongly implying that it is less important. Nothing could be further from the truth. Rather, I would say, scalability is a prerequisite to functionality – a “priority-0” requirement, if ever there was one.

I hope that you find the descriptions of these best practices useful, and that they help you to think in a new way about your own systems, whatever their scale.