Desenvolvimento Web, Web development

Programming inception (or understanding Node.js event loop)

    One of the main problems today in web development is the cost of I/O. I mean, everybody is talking about it. Every web app nowadays, even the smallest ones like blogs or personal pages follow the internet trend of being interconnected. So they always present a link to connect to Facebook, Twitter, LinkedIn and whatnot. Also many of them connects to Databases and clouds. Either way, they use a lot of I/O. And that’s where lies the problem.

But I don’t want to play with my threads anymore ma!

    Imagine a web app that receives millions of requests per minute and communicates with databases (wait for queries and updates) and other services around the web. Now take a look at this table to see why I/O is such a big problem for current programming technologies:

io-cost

    Notice that most Databases nowadays writes on disk, which means it’s slow. When the application needs to communicate with another service through network the waste is even bigger.
How to deal with many requests at a time? Well, today there are some fundamental ways to deal with concurrency in web servers, namely thread-per-request and process-per-request (using Unix’s fork() ).

Thread-per-request: Each request makes the server start a new thread to deal with it. Threads may not consume memory like processes, of course. But, needless to say it eats a lot of memory from the machine anyway. Take a look at this graph to see the difference between a server that uses this model (Apache with mpm_event) against one that uses a single-thread approach (Nginx):

nginx-apache-memory

Ooh! And don’t forget that when working with threads, things can get “real messy real quickly”.

Process-per-request: Each request starts a new process in the server. A process is heavier on the machine than a thread: it consumes even more memory and the overhead to start a process is higher. But some servers found a workaround: Unix forking. With forking we can create a new copy of the main process and instead of allocating a whole new memory for the child process, we can “share” the same memory of the main parent process. Servers like Passenger and Unicorn use this feature. Still, with hundreds, even thousands of connections, fork will be called until we have thousands of processes, which is not the way to go.

Enter Node.js

    The powerful concept behind Node is its single-threaded event-loop that leverages asynchronous calls for doing many things, even (take a guess) I/O! For instance in Java, while executing some code, if you send a query to the database, the code will have to wait for the database to complete it. No, not Node.js. It will execute the lines of code after the database command and then go back to it when the result is ready. It’s really only one thread running, which means if you put a “sleep” in your code, it will block the server for the amount of time stipulated on the sleep function. The idea is pretty cool, but let’s see some example of it. Imagine you have an array of objects you need to save on database and afterwards calculate some stats about these saved records:

//P.S: Really tried to display this code as a snippet.
//It just was not WordPress' day =/

// Loop through some items
items.forEach(function(item){
//Now write each of this items to the disk (I/O here)
item.saveAsync();
});
// This function is meant to be called once all the async
// calls above are done, but we don't know if/when they are,
// and therein lies the problem with this approach
calculateStatsAfterSaving();

It’s not about Node.js, it’s about the Reactor Pattern

    To better understand how the Reactor pattern works, let me help you with an analogy. Imagine you have a coffee shop in one of the most crowded corners in NYC. People are ordering all the time and you serve the tables yourself. Sometimes the order is something previously prepared like a milk or a water. But once in a while some smarty pants will order a spaghetti. If you stop to make the spaghetti, people will be waiting a while since you are the only who serves the tables. But you’re awesome and decides to delegate the task of preparing meals to your butlers: Jeeves and Alfred. You tell them: “Jeeves, you make the pasta and Alfred makes the sauce. When you guys finish, put it on the counter and notify me.” This way it becomes pretty simple for you to handle more orders quickly. If more complicated orders like this shows up, well, just imagine Jeeves and Alfred are so competent that they can receive concurrent requests, no problem. That’s the idea behind the Reactor pattern.
     The single-thread keeps receiving requests all the time and responding, but it cannot block to do more “complex” ones, like blocking I/O (database queries, connect to Facebook, write to a file and so on) otherwise all the other requests will have to wait to be served. Internally, this single-thread delegates the work to fibers, which can run in parallel and avoid blocking the thread by doing I/O. Think of these fibers as threads. Except they’re light-weight and decide when they will pause or resume, not the kernel. See the picture to understand better.

node_loop

    Another detail important to mention is that Node.js expects to return quickly to the client. So if a callback function works with a very long computation, the single-thread will be busy with it, which can stop the event-loop completely. The advantage comes when you have a lot of blocking I/O. Disaster comes when you have CPU-intensive work.

Is Node.js really “the king’s new clothes” ?

    Callbacks can be hard to understand. For all of us used to program in a synchronous way, async can get real messy. Starting with the code. Imagine that: for every function, pass another function with the commands to execute when the first function is finished. What if the callback has a callback itself ? See the point? Node.js is generally not easy to understand, like, say, Ruby or Java. The strategy changes. The way of thinking changes. It can be a lot of different things at once. But no worries: there are solutions. Just to mention some of them, node-async and node-Fibers try to solve the problem with many nested callbacks and exception handling. Going back to the problem some lines above, node-async has some functions that help us to write more readable code:

//P.S: Really tried to display this code as a snippet.
//It just was not WordPress' day =/
async = require("async");
// 1st is the array of items
async.each(items,
// 2nd the function that each item is passed into
function(item, callback){
// Call an asynchronous function
item.saveAsync (function (){
// When done, alert via callback
callback();
});
},
// 3rd parameter is the function call when everything is done
function(err){
// All done
calculateStatsAfterSaving();
}
);

    Yeah, yeah, I agree with you that either way synchronous code beats asynchronous code. So remember that before choosing Node.js for your next project. Although, threads are a pain to work with. You need to be aware of what classes/libraries are thread-safe, be careful to avoid deadlocks and memory leaks, not to mention the whole parafernalia of options to use: mutexes, threads, semaphores, locks, synchronized blocks etc. On the other hand, the Node.js concurrency model is very easy to learn! Although its code will not always present an elegant approach. So either way there will be trouble. The thing is: concurrency is not a simple problem. Therefore solving this complex issue can get messy, one way or another.

Padrão

Deixe uma resposta

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s