arquivo | en RSS para esta seção

Boost.Http has a new parser

I usually don’t blog about updates on the Boost.Http project because I want as much info as possible in code and documentation (or even git history), not here. However, I got a stimulus to change this habit. A new parser I’ve been writing replaced the NodeJS parser in Boost.Http and here is the most appropriate place to inform about the change. This will be useful info also if you’re interested in using NodeJS parser, any HTTP parser or even designing a parser with a stable API unrelated to HTTP.

EDIT (2016/08/07): I tried to clarify the text. Now I try to make it clear whether I’m refering to the new parser (the new parser I wrote) or the old parser (NodeJS parser) everywhere in the text. I’ll also refer to Boost.Http with new parser as new Boost.Http and Boost.Http with old parser as old Boost.Http.

What’s wrong with NodeJS parser?

I started developing a new parser because several users wanted a header-only library and the parser was the main barrier for that in my library. I took the opportunity to also provide a better interface which isn’t limited to C language (inconvenient and lots of unsafe type casts) and uses a style that doesn’t own the control flow (easier to deal with HTTP upgrade and doesn’t require lots of jump’n’back among callbacks).

NodeJS parser argues you can pause it at any time, but its API doesn’t seem that reliable. You need to resort to ugly hacks[1][2] if you want to properly support HTTP pipelining with an algorithm that doesn’t “go back”. If you decide to not stop the parser, you need to store all intermediate results while NodeJS parser refuses to give control back to you, which forces you to allocate (even if NodeJS parser don’t).

NodeJS parser is hard to use. Not only the callback model forces me to go back and forth here and there, it’ll also force me to resort to ugly hacks full of unsafe casts which also increase object size to provide a generic templated interface[1][2][3][4]. Did I mention that it’ll always consume current data and I need to keep appending data everywhere (this is good and bad, the new parser I implemented does a good job at that)? The logic to handle field names is even more complex because this append stuff[1][2][3]. It’s even more complex because NodeJS won’t always decode the tokens (matching and decoding are separate steps) and you need to decode them yourself (and you need to know a lot of HTTP details).

The old parser is so hard to use that I wouldn’t dare to use the same tricks I’ve used in the new Boost.Http to avoid allocations on the old Boost.Http. So the NodeJS parser doesn’t allocate, but dealing with it (old Boost.Http) is so hard that you don’t want to reuse the buffer to keep incomplete tokens at all (forcing allocation or a big-enough secondary buffer to hold them in old Boost.Http).

HTTP upgrade is also very tricky and the lack of documentation for the NodeJS parser is depressing. So I only trust my own code as an usage reference for NodeJS parser.

However, I’ve hid all this complexity from my users. My users wanted a different parser because they wanted a header-only library. I personally only wanted to change the parser because the NodeJS parser only accepts a limited set of HTTP methods and  it was tricky to properly not perform any allocation. The new parser even makes it easier to reject an HTTP element before decoding it (e.g. a URL too long will exhaust the buffer and then the new Boost.Http can just check the `expected_token` function to know it should reply with 414 status code instead concatenating a lot of URL pieces until it detect the limit was reached).

If you aren’t familiar enough with HTTP details, you cannot assume the NodeJS parser will abstract HTTP framing. Your code will get the wrong result and it’ll go silent for a long time before you know it.

The new parser

EDIT(2016/08/09): The new parser is almost ready. It can be used to parse request messages (it’ll be able to parse response messages soon). It’s written in C++03. It’s header-only. It only depends on boost::string_ref, boost::asio::const_buffer and a few others that I may be missing from memory right now. The new parser doesn’t allocate data and returns control to the user as soon as one token is ready or an error is reached. You can mutate the buffer while the parser maintains a reference to it. And the parser will decode the tokens, so you do not need ugly hacks as NodeJS parser requires (removing OWS from the end of header field values).

I want to tell you that the new parser was NOT designed to Boost.Http needs. I wanted to make a general parser and the design started. Then I wanted to replace NodeJS parser within Boost.Http and parts have fit nicely. The only part that didn’t fit perfectly at the time to integrate pieces was a missing end_of_body token that was easy to add in the new parser code. This was the only time that I, as the author of Boost.Http and as a user of the new parser, used my power, as the author of the parser itself, to push my needs on everybody else. And this token was a nice addition anyway (using NodeJS API you’d use http_body_is_final).

My mentor Bjorn Reese had the vision to use an incremental parser much earlier than me. I’ve only been convinced to the power of incremental parsers when I’ve saw a CommonMark parser implemented in Rust. It convinced me immediately. It was very effective. Then I’ve borrowed several conventions on how to represent tokens in C++ from a Bjorn’s experiment.

There is also the “parser combinators” part of this project (still not ready) that I’ve only understood once I’ve watched a talk from Scott Wlaschin. Initially I was having a lot of trouble because I wanted stateful miniparsers to avoid “reparsing” certain parts, but you rarely read 1-sized chunks and I was only complicating things. The combinators part is tricky to deliver, because the next expected token will depend on the value (semantic, not syntax) of current token and this is hard to represent using expressions like Boost.Spirit abstractions. Therefore, I’m only going to deliver the mini-parsers, not the combinators. Feel free to give me feedback/ideas if you want to.

Needless to say the new parser should have the same great features from NodeJS parser like no allocations or syscals behind the scenes. But it was actually easier to avoid and decrease allocations on Boost.Http thanks to the parser’s design of not forcing the user to accumulate values on separate buffers and making offsets easy to obtain.

I probably could achieve the same effect of decreased buffers in Boost.Http with NodeJS parser, but it was quite hard to work with NodeJS parser (read section above). And you should know that the old Boost.Http related to the parser was almost 3 times bigger (it’d be almost 4 times bigger, but I had to add code to detect keep alive property because the new parser only care about message framing) than the new Boost.Http code related to the parser.

On the topic of performance, the new Boost.Http tests consume 7% more time to finish (using a CMake Release build with GCC under my machine). I haven’t spent time trying to improve performance and I think I’ll only try to improve memory usage anyway (the size of the parser structure).

A drawback (is it?) is that the new parser only cares about structuring the HTTP stream. It doesn’t care about connection state (exception: receiving http 1.0 response body/connection close event). Therefore, you need to implement the keep-alive yourself (which the Boost.Http higher-level layers do).

I want to emphasize that the authors of the NodeJS parser have done a wonderful job with what they had in hands: C!

Migrating code to use the new parser

First, I haven’t added the code to parse the status line yet, so the parser is limited to HTTP requests. It shouldn’t take long (a few weeks until I finish this and several other tasks).

When you’re ready to upgrade, use the history of the Boost.Http project (files include/boost/http/socket-inl.hpp and include/boost/http/socket.hpp) as a guide. If you’ve been using NodeJS parser improperly, it’s very likely your code didn’t have as much lines as Boost.Http had. And your code probably isn’t as templated as Boost.Http anyway, so it’s very likely you didn’t need as much tricks with http_parser_settings as Boost.Http needed.

Tufão project has been using NodeJS parser improperly for ages and it’d be hard to fix that. Therefore, I’ll replace “Tufão’s parser” with this new shiny one in the next Tufão release Tufão 1.4.0 has been refactored to use this new parser. It’ll finally gain It finally received support for HTTP pipelining and plenty of bugfixes that nobody noticed will land landed. Unfortunately I got the semantics for HTTP upgrade within Tufão wrong and it kind of has “forced HTTP upgrade” (this is something I got right in Boost.Http thanks to RFC7230 clarification).

Next steps

I may have convinced you to prefer Boost.Http parser over NodeJS parser when it comes to C++ projects. However, I hope to land a few improvements before calling it ready.

API/design-wise I hope to finish miniparsers for individual HTTP ABNF rules.

Test wise I can already tell you more than 80% of all the code written for this parser are tests (like 4 lines of test for each 1 line of implementation). However I haven’t run the tests in combination with sanitizers (yet!) and there a few more areas where tests can be improved (include coverage, allocate buffer chunks separately so sanitizers can detect invalid access attempts, fuzzers…) and I’ll work on them as well.

I can add some code to deduce the optimum size for indexes and return a parser with a little less overhead memory-wise.

EDIT (2016/08/11)

I’ve added a link that should be here since the first version of this post: https://gist.github.com/vinipsmaker/4998ccfacb971a0dc1bd

It adds a lot of background on the project. This is the proposal I’ve sent to get some funding to work on the project.

Boost.Http and Beast.HTTP joining forces

The newest update to the Boost.Http is that I had a long meeting with Vinnie Falco about a possible collaboration and a few changes are going to happen.

The official announcement was sent to the Boost mailing list and is available in the gmane archives.

What this means:

  • A lot of work making changes so projects can hopefully be merged in the future.
  • API will break again.
  • The thing I wasn’t caring about, “HTTP/1.1 oriented interface”, will be provided on a higher-level than simply an HTTP parser. This is what Beast.HTTP already provides.
  • Beast.HTTP already provides WebSocket support and HTTP client support.
  • We’ll be using a new mailing list to coordinate further development and I invite you to join the mailing list if you’re interested in the future of any of these libraries or HTTP APIs in general.

Boost.Http parser project

I’ll develop a HTTP pull parser for Boost.Http during this summer.

The story starts with Boost not being selected for Google Summer of Code this year. I wanted more funding to spend time on Boost.Http and this was unfortunate.

However, funding from other sources was announced for three projects and Boost.Http was one of the selected projects.

I’d rather work on the request router, but I don’t have a strong design for a request router right now because I’m still experimenting. A weak design would translate on a weak proposal and I decided to propose a HTTP parser.

Misc

An interesting HTTP library that carries some similarities with Boost.Http was announced on the Boost mailing list: Beast.

Uisleandro is working on a request router focused on ReST services for Boost.Http: https://github.com/BoostGSoC14/boost.http/compare/master…uisleandro:router1?expand=1.

Multitasking styles, event loops and asynchronous programming

There was a text that I’ve previously published here in this blog about asynchronous programming, but it was written in Portuguese. Now has come the time that I think this text should be available also in English. Translation follows below. Actually is not a “translation” per si, as I adapted the text to be more pleasant when written in English.

One of the subjects that interest me the most in programming is asynchronous programming. This is a subject that I got in touch since I started to play with Qt around 2010, but I slowly moved to new experiences and “paradigms” of asynchronous programming. Node.js and Boost.Asio were other important experiences worth mentioning. The subject caught me a lot and it was the stimulus for me to study a little of computer architecture and operating systems.

Motivation

Several times we stumble upon with problems that demand continuously handling several jobs (e.g. handling network events to mutate local files). Intuitively we may be tempted to use threads, as there are several “parallel” jobs. However, not all jobs executed by the computer are done so exclusively in the CPU.

There are other components beside the CPU. Components that are not programmable and that do not execute your algorithm. Components that are usually slower and do other jobs (e.g. converting a digital signal into an analog one). Also, the communication with these components usually happen serially, through the fetch-execute-check-interrupt cycle. There is a simplification in this argument, but the fact that you don’t read two different files from the same hard drive in parallel remains. Summarizing, using threads isn’t a “natural” abstraction to the problem, as it doesn’t “fit” and/or design the same characteristics. Using threads can add a complex and unnecessary overhead.

“If all you have is a hammer, everything looks like a nail”

Another reason to avoid threads as an answer to the problem is that soon you’ll have more threads than CPU/cores and will face the C10K problem. Even if it didn’t add an absurd overhead, just the fact that you need more threads than available CPUs will make your solution more restrict, as it won’t work on bare-metal environments (i.e. that lacks a modern operating system or a scheduler).

A great performance problem that threads add comes from the fact that they demand a kernel context-switch. Of course this isn’t the only problem, because there is also the cost involved in creating the thread, which might have a short lifetime and spend most of its lifetime sleeping. The own process of creating a thread isn’t completely scalable, because it requires the stack allocation, but the memory is a global resource and right here we have a point of contention.

The performance problem of a kernel context-switch reminds the performance problem of the function calling conventions faced by compilers, but worse.

Functions are isolated and encapsulated units and as such they should behave. When a function is called, the current function doesn’t know which registers will be used by the new function. The current function doesn’t hold the information of which of the CPU registers will be overridden during the new function lifetime. Therefore, the function calling conventions add two new points to do extra processing. One point to save state into the stack and one to restore. That’s why some programmers are so obsessed into function inlining.

The context-switch problem is worse, because the kernel must save the values of all registers and there is also the overhead of the scheduler and the context-switch itself. Processes would be even worse, as there would be the need to reconfigure the MMU. This multitasking style is given the name of preemptive multitasking. I won’t go into details, but you can always dig more into computer architecture and operating systems books (and beyond).

It’s possible to obtain concurrency, which is the property to execute several jobs in the same period of time, without real parallelism. When we do this tasks that are more IO-oriented, it can be interesting to abandon parallelism to achieve more scalability, avoiding the C10K problem. And if a new design is required, we could catch the opportunity to also take into account cooperative multitasking and obtain a result that is even better than the initially planned.

The event loop

One approach that I see, and I see used more in games than anywhere else, is the event loop approach. It’s this approach that we’ll see first.

There is this library, the SDL low level library, whose purpose is to be just a multimedia layer abstraction, to supply what isn’t already available in the C standard library, focusing on the game developer. The SDL library makes use of an event system to handle communication between the process and the external world (keyboard, mouse, windows…), which is usually used in some loop that the programmer prepares. This same structure is used in other places, including Allegro, which was the biggest competitor of SDL in the past.

The idea is to have a set of functions that make the bridge of communication between the process and the external world. In the SDL world, events are described through the non-extensible SDL_Event type. Then you use functions like SDL_PollEvent to receive events and dedicated functions to initiate operations that act on the external world. The support for asynchronous programming in the SDL library is weak, but this same event loop principle could be used in a library that would provide stronger support for asynchronous programming. Below you can find a sample that makes use of the SDL events:

#include <string>
#include <iostream>
#include <SDL.h>
#include <SDL_image.h>
#include "res_path.h"
#include "cleanup.h"
/*
* Lesson 4: Handling Events
*/
//Screen attributes
const int SCREEN_WIDTH = 640;
const int SCREEN_HEIGHT = 480;
/*
* Log an SDL error with some error message to the output stream of our choice
* @param os The output stream to write the message too
* @param msg The error message to write, format will be msg error: SDL_GetError()
*/
void logSDLError(std::ostream &os, const std::string &msg){
os << msg << " error: " << SDL_GetError() << std::endl;
}
/*
* Loads an image into a texture on the rendering device
* @param file The image file to load
* @param ren The renderer to load the texture onto
* @return the loaded texture, or nullptr if something went wrong.
*/
SDL_Texture* loadTexture(const std::string &file, SDL_Renderer *ren){
SDL_Texture *texture = IMG_LoadTexture(ren, file.c_str());
if (texture == nullptr){
logSDLError(std::cout, "LoadTexture");
}
return texture;
}
/*
* Draw an SDL_Texture to an SDL_Renderer at position x, y, with some desired
* width and height
* @param tex The source texture we want to draw
* @param rend The renderer we want to draw too
* @param x The x coordinate to draw too
* @param y The y coordinate to draw too
* @param w The width of the texture to draw
* @param h The height of the texture to draw
*/
void renderTexture(SDL_Texture *tex, SDL_Renderer *ren, int x, int y, int w, int h){
//Setup the destination rectangle to be at the position we want
SDL_Rect dst;
dst.x = x;
dst.y = y;
dst.w = w;
dst.h = h;
SDL_RenderCopy(ren, tex, NULL, &dst);
}
/*
* Draw an SDL_Texture to an SDL_Renderer at position x, y, preserving
* the texture's width and height
* @param tex The source texture we want to draw
* @param rend The renderer we want to draw too
* @param x The x coordinate to draw too
* @param y The y coordinate to draw too
*/
void renderTexture(SDL_Texture *tex, SDL_Renderer *ren, int x, int y){
int w, h;
SDL_QueryTexture(tex, NULL, NULL, &w, &h);
renderTexture(tex, ren, x, y, w, h);
}
int main(int, char**){
//Start up SDL and make sure it went ok
if (SDL_Init(SDL_INIT_VIDEO) != 0){
logSDLError(std::cout, "SDL_Init");
return 1;
}
//Setup our window and renderer, this time let's put our window in the center
//of the screen
SDL_Window *window = SDL_CreateWindow("Lesson 4", SDL_WINDOWPOS_CENTERED,
SDL_WINDOWPOS_CENTERED, SCREEN_WIDTH, SCREEN_HEIGHT, SDL_WINDOW_SHOWN);
if (window == nullptr){
logSDLError(std::cout, "CreateWindow");
SDL_Quit();
return 1;
}
SDL_Renderer *renderer = SDL_CreateRenderer(window, -1, SDL_RENDERER_ACCELERATED | SDL_RENDERER_PRESENTVSYNC);
if (renderer == nullptr){
logSDLError(std::cout, "CreateRenderer");
cleanup(window);
SDL_Quit();
return 1;
}
//The texture we'll be using
const std::string resPath = getResourcePath("Lesson4");
SDL_Texture *image = loadTexture(resPath + "image.png", renderer);
if (image == nullptr){
cleanup(image, renderer, window);
IMG_Quit();
SDL_Quit();
return 1;
}
//Our texture size won't change, so we can get it here
//instead of constantly allocating/deleting ints in the loop
int iW, iH;
SDL_QueryTexture(image, NULL, NULL, &iW, &iH);
int x = SCREEN_WIDTH / 2 - iW / 2;
int y = SCREEN_HEIGHT / 2 - iH / 2;
//Our event structure
SDL_Event e;
//For tracking if we want to quit
bool quit = false;
while (!quit){
//Read any events that occured, for now we'll just quit if any event occurs
while (SDL_PollEvent(&e)){
//If user closes the window
if (e.type == SDL_QUIT){
quit = true;
}
//If user presses any key
if (e.type == SDL_KEYDOWN){
quit = true;
}
//If user clicks the mouse
if (e.type == SDL_MOUSEBUTTONDOWN){
quit = true;
}
}
//Rendering
SDL_RenderClear(renderer);
//Draw the image
renderTexture(image, renderer, x, y);
//Update the screen
SDL_RenderPresent(renderer);
}
//Destroy the various items
cleanup(image, renderer, window);
IMG_Quit();
SDL_Quit();
return 0;
}
view raw main.cpp hosted with ❤ by GitHub

There are the GUI libraries like GTK+, EFL and Qt which take this idea one step further, abstracting into an object the event loop that were previously written and rewritten by you. The Boost.Asio library, which focuses on asynchronous operations and not on GUIs, has a class of similar purpose, the io_service class.

To remove your need to write boilerplate code to route events to specific actions, the previously mentioned classes will handle this task for you, possibly using callbacks, which are an old abstraction. The idea is that you should bind events to functions that might be interested in handling those events. Now the logic to route these events belong to the “event loop object” instead a “real”/raw event loop. This style of asynchronous programming is a passive style, because you only register the callbacks and transfer the control to the framework.

Now that we’re one level of abstraction higher than event loops, let’s stop the discussion about event loops. And these objects that we referring to as “event loop objects” will be mentioned as executors from now on.

The executor

The executor is an object that can execute encapsulated units of work. Using only C++11, we can implement an executor that schedules operations related to waiting some duration of time. There is the global resource RTC and, instead of approaching the problem by creating several threads that start blocking operations like the sleep_for operation, we’ll use an executor. The executor will schedule and manage all events that are required. You can find a simple implementation for such executor following:

#include <thread>
#include <chrono>
#include <functional>
#include <vector>
#include <iostream>
#include <algorithm>
class executor
{
public:
void add_sleep_for_callback(std::chrono::milliseconds ms,
std::function<void()> cb);
void run();
private:
struct internal_callback
{
std::chrono::milliseconds ms;
std::function<void()> cb;
};
std::vector<internal_callback> callbacks;
std::chrono::milliseconds passed{0};
};
void executor::add_sleep_for_callback(std::chrono::milliseconds ms,
std::function<void()> cb)
{
callbacks.insert(std::upper_bound(callbacks.begin(), callbacks.end(),
[&]() {
internal_callback ret;
ret.ms = ms;
return ret;
}(),
[](const internal_callback &l,
const internal_callback &r) {
return l.ms < r.ms;
}),
[&]() {
internal_callback ret;
ret.ms = ms;
ret.cb = cb;
return ret;
}());
}
void executor::run()
{
while (callbacks.size()) {
auto cur = callbacks.front().ms - passed;
passed += cur;
std::this_thread::sleep_for(cur);
callbacks.front().cb();
callbacks.erase(callbacks.begin());
}
passed = 0;
}
int main(int /*argc*/, char */*argv*/[])
{
typedef std::chrono::milliseconds milli;
executor ex;
ex.add_sleep_for_callback(milli{5000}, []() {
std::cout << "5000ms passed" << std::endl;
});
ex.add_sleep_for_callback(milli{1000}, []() {
std::cout << "1000ms passed" << std::endl;
});
ex.add_sleep_for_callback(milli{3000}, []() {
std::cout << "3000ms passed" << std::endl;
});
ex.run();
return 0;
}
view raw executor.cpp hosted with ❤ by GitHub

This code reminds me of sleepsort.

In the example, without using threads, it was possible to execute the several concurrent jobs related to waiting time. To do such thing, we gave the executor the responsibility to share the RTC global resource. Because the CPU is faster than the requested tasks, only one thread was enough and, even so, there was a period of time for which the CPU was idle.

There are some concepts to be extracted from this example. First, let’s consider that the executor is an standard abstraction, provided in some interoperable way among all the code pieces that makes use of asynchronous operations. When a program wants to do the wait asynchronous operation, the program request the operation start to some abstraction — in this case it’s the executor itself, but it’s more common to find these operations into “I/O objects” — through a function. The control is passed to the executor — through the method run — which will check for notification of finished tasks. When there are no more tasks on the queue, the executor will give back the control of the thread.

Because there is only one thread of execution, but several tasks to execute, we have the resource sharing problem. In this case, the resource to be shared is the CPU/thread itself. The control should go and come between some abstraction and the user code. This is the core of cooperative multitasking. There are customization points to affect behaviour among the algorithms that execute the tasks, making them give CPU time to the execution of other tasks.

One advantage of the cooperative multitasking style of multitasking is that all “switch” points are well defined. You know exactly when the control goes from one task to another. So that context switch overhead that we saw earlier don’t exist — where all register values need to be saved… A solution that is more elegant, efficient and green.

The object that we passed as the last argument to the function add_sleep_for_callback is a callback — also know as completion handler. Think about what would happen if a new wait operation was requested within one of the completion handlers that we registered. There is an improved version of the previous executor following:

#include <thread>
#include <chrono>
#include <functional>
#include <vector>
#include <iostream>
#include <algorithm>
class executor
{
public:
void add_sleep_for_callback(std::chrono::milliseconds ms,
std::function<void()> cb);
void run();
private:
struct internal_callback
{
std::chrono::milliseconds ms;
std::function<void()> cb;
};
std::vector<internal_callback> callbacks;
std::chrono::milliseconds passed{0};
};
void executor::add_sleep_for_callback(std::chrono::milliseconds ms,
std::function<void()> cb)
{
ms += passed; // NEW!!!
callbacks.insert(std::upper_bound(callbacks.begin(), callbacks.end(),
[&]() {
internal_callback ret;
ret.ms = ms;
return ret;
}(),
[](const internal_callback &l,
const internal_callback &r) {
return l.ms < r.ms;
}),
[&]() {
internal_callback ret;
ret.ms = ms;
ret.cb = cb;
return ret;
}());
}
void executor::run()
{
while (callbacks.size()) {
auto cur = callbacks.front().ms - passed;
passed += cur;
std::this_thread::sleep_for(cur);
callbacks.front().cb();
callbacks.erase(callbacks.begin());
}
passed = 0;
}
int main(int /*argc*/, char */*argv*/[])
{
typedef std::chrono::milliseconds milli;
executor ex;
ex.add_sleep_for_callback(milli{5000}, []() {
std::cout << "5000ms passed" << std::endl;
});
ex.add_sleep_for_callback(milli{1000}, [&ex]() {
std::cout << "1000ms passed" << std::endl;
ex.add_sleep_for_callback(milli{2000}, []() {
std::cout << "3000ms passed" << std::endl;
});
});
ex.run();
return 0;
}
view raw executor.cpp hosted with ❤ by GitHub

This implementation detail reminds me of the SHLVL SHELL variable.

An interesting case is the case from the JavaScript language. JavaScript has a kind of “implicit executor”, which is triggered when the VM reaches the end of your code. In this case, you don’t need to write codes like “while (true) executor.run_one()” or “executor.run()“. You only would need to register the callbacks and make sure there are no infinite loops until the executor gets the chance to be in control.

With the motivation introduced, the text started to decrease the use of mentions to I/O for reasons of simplicity and focus, but keep in mind that we use asynchronous operation mostly to interact with the external world. Therefore many operation are scheduled conditionally in response to the notification of the completion of a previous task (e.g. if protocol invalid, close socket else schedule another read operation). Proposals like N3785 and N4046 define executors also to schedule thread pools, not only timeouts within a thread. Lastly it’s possible to implement executors that schedule I/O operations within the same thread.

Asynchronous algorithms represented in synchronous manners

The problem with the callback approach is that we no longer have a code that is clean and readable. Previously, the code could be read sequentially because this is what the code was, a sequence of instructions. However, now we need to spread the logic among lots of lots of callbacks. Now you have blocks of code that are related far apart from each other. Lambdas can help a little, but it’s not enough. The problem is know as callback/nesting hell and it’s similar to the spaghetti code problem. Not being bad enough, the execution flow became controverted because the asynchronous nature of the operations itself and constructs like branching and repetition control structures and even error handling assume a representation that are far from ideal, obscures and difficult to read.

One abstraction of procedures very important to asynchronous programming is coroutines. There is the procedure abstraction that we refer to under the name of “function”. This so called “function” models the concept of subroutine. And we have the coroutine, which is a generalization of a subroutine. The coroutine has two more operations than subroutine, suspend and resume.

When your “function” is a coroutine — possible when the language provides support to coroutines — it’s possible to suspend the function before it reaches the end of execution, possibly giving a value during the suspend event. One example where coroutines are useful is on a hypothetical fibonacci generating function and a program that uses this function to print the first 10 numbers of this infinite sequence. The following Python code demonstrate an implementation of such example, where you can get an insight of the elegance, readability and and reuse that the concept of coroutines allows when we have the problem of cooperative multitasking:

from itertools import islice
def fibonacci():
a,b = 0,1
yield(a)
while True:
a,b = b,a+b
yield(a)
print(list(islice(fibonacci(), 10)))
view raw fibonacci.py hosted with ❤ by GitHub

This code reminds me of the setjmp/longjmp functions.

One characteristic that we must give attention in case you aren’t familiarized with the concept is that the values of the local variables are preserved among the several “calls”. More accurately, when the function is resumed, it has the same execution stack that it had when it was suspended. This is a “full” implementation of the coroutine concept — sometimes mentioned as stackful coroutine. There are also stackless coroutines where only the “line under execution” is remembered/restored.

The N4286 proposal introduces a new keyword, await, to identify a suspension point for a coroutine. Making use of such functionality, we can construct the following example, which elegantly defines an asynchronous algorithm described in a very much “synchronous manner”. It also makes use of the several language constructs that we’re used to — branching, repetition and others:

std::future<void> tcp_reader(int total)
{
char buf[64 * 1024];
auto conn = await Tcp::Connect("127.0.0.1", 1337);
do
{
auto bytesRead = await conn.read(buf, sizeof(buf));
total -= bytesRead;
}
while (total > 0);
}
int main() { tcp_reader(1000 * 1000 * 1000).get(); }
view raw await.cpp hosted with ❤ by GitHub

Coroutines solve the complexity problem that asynchronous algorithms demand. However, there are several coroutines proposals and none of them was standardized yet. An interesting case is the Asio library that implemented a mechanism similar to Duff’s Device using macros to provide stackless coroutines. For C++, I hope that the committee continue to follow the “you only pay for what you use” principle and that we get implementations with high performance.

While you wait for the standardization, we can opt for library-level solutions. If the language is low-level and gives the programmer control enough, these solutions will exist — even if it’s going to be not portable. C++ has fibers and Rust has mioco.

Another option to use while you wait for coroutines it not to use them. It’s still possible to achieve high performance implementation without them. The big problem will be the highly convoluted control flow you might get.

Completion tokens

While there is no standardized approach for asynchronous operations in the C++ language, the Boost.Asio library, since the version 1.54, adopted an interesting concept. The Boost.Asio implemented an extensible solution. Such solution is well documented in the N4045 proposal and you’ll only find a summary here.

The proposal is assumes that the callback model is not always interesting and can be even confusing sometimes. So it should be evolved to support other models. Now, instead receiving a completion handler (the callback), the functions should receive a completion token, which adds the necessary customization point to support other asynchronous models.

The N4045 document uses a top-down approach, first showing how the proposal is used to then proceed to low-level implementation details. You can find a sample code from the document following:

string // task<string>
read(string file, string suffix, // read(string file, string suffix)
yield_context yield) { // __async {
istream fi = open(file, yield); // istream fi = __await open(file);
string ret, chunk;
while((chunk = fi.read(yield)).size()) // while((chunk = __await fi.read()).size())
ret += chunk + suffix;
return ret;
}

In the code that you just saw, every time the variable yield is passed to some asynchronous operation (e.g. open and read), the function is suspended until the operation is completed. When the operation completes, the function is resumed in the point where it was suspended and the function that started the asynchronous operation returns the result of the operation. The Fiber library, shortly mentioned previously, provides a yield_context to the Asio extensible model, the boost::fibers::asio::yield. It’s asynchronous code written in a synchronous manner. However, an extensible model is adopted because we don’t know which model will be the standard for asynchronous operations and therefore we cannot force a single model to rule them all.

To build an extensible model, the return type of the function needs to be deduced (using the token) and the value of the return also needs to be deduced (also using the token). The return type is deduced using the token type and the returned value is created using the token passed as argument. And you still have the handler, which must be called when the operation completes. The handler is extracted from the token. The completion tokens model makes use of type traits to extract all information. If the traits aren’t specialized, the default behaviour is to treat the token as a handler, turning the approach compatible with the callback model.

Several examples are given in the N4045 document:

  • use_future
  • boost::fibers::asio::use_future
  • boost::fibers::asio::yield
  • block

The std::future approach have meaningful impact on performance and this is not cool, like explained in the N4045 document. This is the reason why I don’t mention it in this text.

Signals and slots

One alternative that was proposed to the model of callbacks is the signals and slots approach. This approach is implemented in libgsigc++, Boost, Qt and a few other libraries.

This proposal introduces the concept of a signal, which is used to notify an event, but abstracts the delivery of the notification and the process of registering functions that are interested in the event. The code that notifies events just need to worry about emitting the signal every time the event happens because the signal itself will take care of handling the set of registered slots and stuff.

This approach usually allows a very decoupled architecture, in opposition of the very verbose approach largely used in Java. An interesting effect, depending on the implementation, is the possibility to connect one signal to another signal. It’s also possible to have multiple signals connected to one slot or one signal connected to multiple slots.

The signal usually is related to one object, and when the object is destroyed, the connections are destroyed too. Just like it’s also possible to have slot objects that automatically disconnect from connected signals once destroyed, so you have a safer abstraction.

Given signals are independently implemented abstractions and usable as soon as they are exposed, it’s naturally intuitive to remove the callback argument from the operations that initiate asynchronous operations to avoid duplication of efforts. If you go further in this direction, you’ll even remove the own function to do asynchronous operations, exposing just the signal used to receive notifications, your framework you’ll be following the passive style instead active style. Examples of such style are the Qt’s socket, which doesn’t have an explicit function to request the start of the read operation, and the POCO library, which doesn’t have a function to request that receiving of a HTTP request.

Another detail which we have in signals and slots approach is the idea of access control. In the Qt case, signals are implemented in a way that demands the cooperation of one preprocessor, the own Qt executor and the QObject class. In the Qt case, the control access rules for emitting a signal follow the same rules for protected methods from C++ (i.e. all children classes can emit signals defined in parent classes). The operation of connecting a signal to another signal or a slot follows the same rules of public members of C++ (i.e. anyone can execute the operation).

In the case of libraries that implement the signal concept as a type, it’s common to observe a type that encapsulate both, the operation to emit the signal and the operation to connect the signal to some slot (different from what we see in the futures and promises proposal, where each one can have different control access).

The signals and slots approach is cool, but it doesn’t solve the problem of complexity that is solved with coroutines. I only mentioned this approach to discuss better the difference between the active style and the passive style.

Active model vs passive model

In the passive model, you don’t schedule the start of the operations. It’s what we commonly find in “productive” frameworks, but there are many questions that this style doesn’t answer quite well.

Making a quick comparison between the libraries Qt and Boost.Asio. In both libraries, you find classes to abstract the socket concept, but using Qt you handle the event readyRead using the readAll method to receive the buffer with the data. In contrast, using Boost.Asio, you start the operation async_read_some and pass the buffer as argument. Qt uses the passive style and Boost.Asio uses the active style.

The readyRead event, from Qt, acts independently from the user and requires a buffer allocation every time it occurs. Then some questions arise. “How can I customize the buffer allocation algorithm?”, “how can I customize the application to allow recycling buffers?”, “how do I use a buffer that is allocated on the stack?” and more. The passive model doesn’t answer questions like these, then you need to fatten the socket abstraction with more customization points to allow behaviours like the ones described. It’s a combinatorial explosion to every abstraction that deals with asynchronous operations. In the active model, these customizations are very natural. If there is any resource demanded by the operation that the developer is about to start, the developer just need to pass the resource as an argument. And it’s not only about resource acquisition. Another example of question that the passive model doesn’t answer well is “how do I decide whether I’m gonna accept a new connection or postpone to when the server is less busy?”. It’s a great power to applications that are seriously concerned about performance and need fine adjustments.

Besides performance and fine tuning, the passive model is also causes trouble to debugging and tests thanks to the inversion of control.

I must admit that the passive model is quite good to rapid prototyping and increasing productive. Fortunately, we can implement the passive model on top of the active model, but the opposite is not so easy to do.

Bonus references

If you like this subject and want to dig more, I suggest the following references:

Boost.Http rejected

Previously I informed that my library was about to be reviewed by the Boost community. And some time ago, this review happened and a result was published.

The outcome of the Boost review process was that my library should not be included in Boost. I think I reacted to the review very well and that I defended the library design with good arguments.

There was valuable feedback that I gained through the review process. Feedback that I can use to improve the library. And improvements to the library shouldn’t stop once the library is accepted into Boost, so I was expecting to spend more time in the library even after the review, as suggested by the presence of a roadmap chapter on the library documentation.

The biggest complaint about the library now was completeness. The library “hasn’t proven” that the API is right. The lack of higher-level building blocks was important to contribute to the lack of trust in current API. Also, if such library was to enter in Boost, it should be complete, so new users would have a satisfying first impression and continue to use the library after the initial contact. I was worried about delivering a megazord library to be reviewed in just one review, but that’s what will happen next time I submit the library to review. At least I introduced several concepts to the readers already.

Things that I planned were forgotten when I created the documentation and I’ll have to improve documentation once again to ensure guarantees that I had planned already. Also, some neat ideas were given to improve library design and further documentation updates will be required. Documentation was also lacking in the area of tutorial. Truth be told, I’m not very skilled in writing tutorials. Hopefully, the higher-level API will help me to introduce the library to newbies. Also, I can include several tutorials in the library to improve its status.

There was an idea about parser/generator (like 3-level instead 2-level indirection) idea that will require me to think even more about the design. Even now, I haven’t thought enough about this design yet. One thing for sure is that I’ll have to expose an HTTP parser because that’s the only thing that matters for some users.

A few other minor complaints were raised that can be addressed easily.

If you are willing to discuss more about the library, I have recently created a gitter channel where discussions can happen.

And for now, I need to re-read all messages given for the review and register associated issues in the project’s issue tracker. I’d like to have something ready by January, but it’ll probably take 6 months to 1 year before I submit the library again. Also, the HTTP client library is something that will possibly delay the library a lot, as I’ll research power users like Firefox and Chromium to make sure that the library is feature-ready for everybody.

So much work that maybe I’ll submit the library as a project on GSoC again next year to gather some more funding.

Also, I’d like to use this space to spread two efforts that I intend to make once the library is accepted into Boost:

  • A Rust “port” of the library. Actually, it won’t be an 1:1 port, as I intend to use Rust’s unique expressiveness to develop a library that feels like a library that was born with Rust in mind.
  • An enhanced, non-slow and not resource-hungry implementation (maybe Rust, maybe C++) of the trsst project.

EDIT (2016/03/20):

And for now, I need to re-read all messages given for the review and register associated issues in the project’s issue tracker.

That part is done: https://github.com/boostgsoc14/boost.http/issues?utf8=%E2%9C%93&q=is%3Aissue+label%3Aboost-review+

I’d like to have something ready by January, but it’ll probably take 6 months to 1 year before I submit the library again.

I think I was being too optimistic when I commented “6 months”. It’d only be possible to complete it within 6 months if Boost.Http was the only project I was developing. Of course I have a job and university and the splitted focus wouldn’t allow me to finish the library in just this small amount of time.

Boost.Http scheduled for review

February last year, I was writing an email to the Boost user mailing list, asking for feedback on an HTTP server library proposal. Later that year, I got the news that I was to receive funds through Google to work on such proposal. And I had delivered a library by the asked timeframe. However, getting the library into Boost is another game. You must pass through the review process if you want to integrate some library into Boost. And I’d like to announce that my library is finally going to be reviewed starting on Friday.

Some links:

After working on this project for more than a year, I’m pretty glad it finally reached this milestone. And I’m very confident about the review.

I had written about this project so much here and there that when the time to write about it in my own blog comes, I don’t have many words left. It’s a good thing to leave as much info on the documentation itself and don’t spread info everywhere or try to lure users into my blog by providing little information on the documentation.

Suggestion: 8+1 languages worth learning

Introduction

Every now and then I have contact with a few programming languages and this is the subset that I believe it would give me a very close insight to the sum of the all languages that I’ve had contact with. Also, this subset is not only based on the choice of ideas that each language aggregate, but also on their usefulness and importance for the general programmer’s toolbox.

Regex

Just about the most awesome way to describe and manipulate words from regular languages. No matter if it’s used as communication purposes within some specification or if it’s used to crawl certain patterns within a large collection of texts. It’s useful even within the programming environment itself. And to contribute to its awesomeness, it’s one of the easiest and fastest things to learn. It’s useful even for non-programmers (think about that time when you want to rename all files from a folder to have better consistency).

You can visualize regex patterns using Regexper or any of its competitors.

MarkDown/CommonMark

Started as a simple tool to pretify common syntax used in text-based email. But now just about almost every major site visited by programmers (e.g. StackOverflow, Reddit, GitHub, Doxygen-generated ones) has some support for MarkDown. Or its recent attempt for a smart standardization to spread good common practices and inspire better interoperability among supporting tools.

You can think of MarkDown as a simple way to describe which parts of the text will be bold or will be the tittle for a subsection and so on. MarkDown is simple! MarkDown is simple enough to be accepted in non-programmer targeted products like blogging platforms (even WordPress) or discussion platforms.

C

A language that appeared in 1972 that is still interesting and it’s still important. Being the “portable Assembly”, operating system’s kernels are still written in C. Pieces of software dealing with low-level are still written in C. Embedded projects are still written in C.

C is not a popular language out of merits. C is just the right abstraction to forget about Assembly, but still have no overhead between your software and the machine. Compilers will do a fantastic job in no time for you.

C is an easy language to learn, adding just a few handful abstractions like subroutines and structures to learn. Of course, C is very low-level and you’re expected to face manual memory management (and memory leaks), bit by bit serialization, pointer to functions (no closures here), architecture and operating system differences and maybe things like varargs, setjmp and mmap. You should be able to understand the implications on performance some decision has. This insight is something C has been a great language at and will hardly be acquired learning another language.

Haskell

Haskell is one of the languages I learnt this year. It’s a typed purely functional language. It’s a great language. It has great concepts to decrease the total number of lines of code you should write (like list comprehensions and pattern matching), a clever syntax and some great concepts you could learn (higher-order functions, currying, lazy evaluation…).

Not all about Haskell was new to me, as I had already learn functional programming through Scheme some years ago, but Haskell does a much better job. I hate Lisp naming conventions (car for the head of the list, seriously) and excessive number of parentheses. You shouldn’t have to follow my path. You should be introduced to functional programming with Haskell.

Also, look at how elegant this QuickSort is:

sort :: (Ord a) => [a] -> [a]
sort [] = []
sort (pivot:rest) = sort [y | y <- rest, y < pivot]
++ [pivot] ++
sort [y | y <- rest, y >= pivot]
view raw qs.hs hosted with ❤ by GitHub

Ruby

Ruby is another of these languages I learnt this year. It’s a purely object-oriented language. Some cleverness was invested around its syntax and I very much appreciate this. It’s a very dynamic language where every class is open and even things like attr_reader are methods.

Object-oriented programming is one of these must-have skills for a programmer and I think Ruby, being purely object-oriented, is a great language to learn this paradigm. Hide and encapsulate!

I choose to learn Ruby looking for a scripting language to empower a possible game engine that I might code. Ruby really impressed me. Ruby is so dynamic that even if I design a wrong class hierarchy or something, Ruby probably has a way to fix it. I don’t intend to design bad hierarchies, but I don’t know who will use my possible future game engine and this concern then becomes undeniably important.

JavaScript

One of the worst languages I’ve ever seen. But also one of the best languages I’ve ever seen (yep, out there you can find programming languages that would surprise you in the bad way). This language is not what I’d like to be the most popular, but it’s just enough to not be hated. Also, it runs on about every web browser, which is like… everywhere. Importance and interoperability. It’s like you really need to know JavaScript.

JavaScript is like the assembly for the web. You’ll find many tools that translate source code from some language into JavaScript just to enable execution within the browser. Once developed to the browser, JavaScript has grow since and now it’s popular even on the server-side. JavaScript also conquered the smart-side.

Not knowing anything about JavaScript is almost like not knowing how to read in the programming industry. It’s a terrible language full of bad decisions, but it’s the common denominator of the web development.

Learning JavaScript also may help to solidify concepts you should know like asynchronous APIs, JSON and some others.

XML/HTML

Responsible for most of the web traffic, this is a pretty important and simple language to understand how web documents are structured. If you think I’m overestimating web, it’s because it’s one of the greatest things we have. But XML is not only about web, it’s about interoperable documents and protocols and it is used as such. You can find XML in use within vector formats, formats for office applications and even chat protocols. I think learning the basics of XML is a big deal.

LaTeX

I personally think that the LaTeX tools aren’t among the most polished tools. Just look at the Makefile generated by Doxygen to see the run-until-no-more-differences-found loop to work around inconveniences in the LaTeX tools. Or just look at the terrible error messages. Also, the syntax isn’t surprisingly pleasant.

But when you want to focus on the content, forget about the accumulated little formatting details and produce beautiful scientific papers, a book with consistently in-between linked references or even just a few math formulas, LaTeX is probably what you should, at least, consider.

Bonus: bash

Capable to automate the most surprising tasks in a computer, if you are using an Unix variant system. You could automate builds, customize software startup sequences and manage your system. But if you’re using an Unix variant system, you already may be aware of that.

Notes

No Java, C++ or Python in this list. Maybe I’ll do a part 2 of this article containing languages with a lesser chance to be used like SQL, MongoDB, OpenGL, R, GStreamer or some Assembly. Actually, I think Java, C++ and Python have a better chance to be used than Haskell, but if you learn every language in this list, C++, Java and Python will be easy to catch up and the lines of code you write will be more elegant.

I don’t feel the will to trust on logic all the time

If the last two air-planes from company A crashed, it doesn’t mean that their next air-plane is going to trash. It’s not a logical consequence. It might be a statistics problem, but then I won’t trust the company for awhile anyway.

If the situation “who will you gonna save?” happens, then you don’t have a logical decision. I don’t use logic’s help on all my decisions.

Anyway, since I started computer science, I feel like I’ve seen less events where people use illogical arguments to take decisions.

“one ring to rule them all”