It’s been two months already since my last post on this blog. All this time (and more), I’ve been (among other tasks) working on my 2014 GSoC project, an HTTP server proposal to Boost. I’ve finally reached a point where I feel it’s ready for major review/feedback.
If you’re a C++ programmer, a native speaker or an HTTP researcher (or just a little of everything) and you want to help, I’d like to ask you to review the project (interface-wise) and give me feedback.
This isn’t my first time on GSoC, but the experience was very different. The communities, development model, targeted audience, knowledge domain, to name a few, were completely different. Also, this year’s project wasn’t as challenging as the last one (although this is very subjective).
I improved as a programmer, but this is limiting. Once you become able to write algorithms for turing-machine compatible devices, there isn’t much room for improvement and you need to hunt other skills to continue improving (e.g. security, parallelism…). Coding for the sake of coding solves no problems. I decided to take a break (not exactly now, but in a few weeks) to make a shift and start to increase the efforts I dedicate to my academic skills.
I aim to integrate this library into Boost, so I still want to invest effort in this project.
I created a new category on this blog to track the progress, so you’ll be able to have a separate rss feed for these posts. The new category URL is http://vinipsmaker.wordpress.com/category/computacao/gsoc2014-boost/.
Looks like I’m super creative lately. I’ve had another idea to improve libdepixelize.
The idea is to use the knowledge I got from the compilers course on the Coursera MOOC platform to make libdepixelize implement an abstract machine that can be programmed through image metadata gathered from PNG files to execute arbitrary code to turn libdepixelize into a generic cg framework.
Such extension will allow libdepixelize transform images like the followin one …
… into the following other one:
In fact, I’m so excited about this idea that I’ll start to work right now and I’ll try to deliver this feature at the end of this very week and you can wait for some extensions in the next week.
In the list of planned extensions, there is support for examining your computer usage, then we can gather general info about our citizens and help the police chase criminals. But don’t worry, we’ll only use this info for the good.
I’ve invested some effort to improve libdepixelize response to evil patterns. Because libdepixelize won’t discard color information, the connections of the similarity graph aren’t a transitivity relation and extra topological patterns can happen. I refer to the extra patterns as evil patterns. The name comes from the fact that I enjoy to play Zelda and defeat the evil from the Hyrule land. Same happened in the last libdepixelize’s commits, where I overcame some patterns to improve the output quality. Quality can be something subjective sometimes, then I was a little conservative and limited my changes to things that I could reason about. The effort end up in new samples to the libdepixelize documentation, new rules for the algorithm and its related documentation and lines of code in libdepixelize itself.
The new rules are added as an extra step on the process. They are not treated like the Kopf-Lischinski’s heuristics to resolve crossing connections. I think maybe would be possible to get some of the new rules and describe versions that are more general and can be added to the “container” that holds the old heuristics. To make this happen, I’d need to define the concept of “similar color” as a set and operations on top of the set, the notion of interval and other things and logical reasoning on top of all that. A bit of mathematical work to improve the quality a little more, but I wanna to investigate the use of La*b* colors (an old suggestion by Nathan) to replace the current concept of “similar colors”. I didn’t replaced the current space color until now, because YUV and La*b* behave differently and I couldn’t just convert the YUV constants that define the boundary between dissimilar colors to La*b* to achieve better results. The current YUV constants were taken from HQx filter and I need to define a methodology to find new constants.
The advantage of a rule that is more general and unifies behaviour is just the beauty of a simpler rule that handles the real problem, as opposed to several branches that are consequence of the real problem. It’d abstract the nature of the problem better. It’d make the code simpler. It’d handle more branches. Being a single rule that affect more branches, it’d be easier to test and better at convincing me that the improvement is real and there will be no loss of quality in other images.
It’d be interesting to investigate the range of voting in all heuristics and try to come up with “fair” multipliers/strength.
New idea to represent splines
Previously I used a technique to carefully insert new nodes to obey the technique “adjust splines” from the Kopf-Lischinski paper while being limited by the old splines representation. This technique has caused troubles to several later steps.
The first problem was to remove the extra invisible nodes that are present in the output SVG. These extra node not only make the output larger and make rendering consume more CPU time, but also can be troublesome for artists wanting to manurally edit the generated image. I’ve made several unsuccessful attempts to remove these nodes and I’m sure it’s possible, but I’ll try to move away from this problem trying a completely different approach to the problem.
The second problem is similar, in essence, to the first one, described in the paragraph above. When I originally disabled the optimization step in the Inkscape GUI and marked it as experimental, one of the reasons was because it required “extra preprocessing steps” (it was a pretty vague explanation, but I should try to improve my communication skills). With no extra invisible points, the optimization step will be way simpler. The step related to optimization that avoid overlapping shapes and holes will partly go away and the approach I mentioned previously (“A new idea to keep the shape of optimized splines correct“) will be affected.
The idea is to split splines. I was previously using a list of points. The new idea is to use a list of splines, where spline itself is a list of points. I hope that the new representation will allow a representation closer to “arbitrary”, just enough to apply the operation “adjust splines”. The new representation should do without extra points and easy the last processing steps (in terms of processing power and my productivity). Of course this change requires a lot of refactoring and will take a bit of time to be finished. Also, the “hope” word used previously means that I haven’t thought about all implications and I’m sharing this idea very early. I’m trying to improve my communication skills and this means that you’ll see some “flood” on this blog. The “flood” might include ideas that eventually prove to be bad at a later point and doesn’t hit the libdepixelize code.
First, the software is free software, then anyone can use it (even the development versions). So, if you find a crash or something that obviously is a bug, fill a bug report and I’ll fix it.
Second, I still want to improve the quality of the algorithm, then a good pixel art database can help me a lot. Currently the algorithm behaves bad at images with too many gradients, but this doesn’t mean that a lot of images with gradients will help me to improve the quality of the algorithm. I need to publish a page explaining what kind of image can help me to improve the quality of the algorithm.
Third, you can help me with profiling info to improve the performance of the algorithm. I’ll probably send a message to the Inkscape mailing list with the title “request for beta testers/profiling data on libdepixelize”. I’ll probably (re-)share the message through Google+, this blog and maybe more. If you follow any of these communication channels, you’ll know when I need help.
Thanks for the support.
The project doesn’t have a logo and uses an ugly icon in the Inkscape GUI dialog. I was thinking about use the character introduced by Jabier on the Inkscape mailing list to represent libdepixelize project:
This image already is highlighted on the algorithmic documentation report anyway and is unique enough.
I’ll have a test at the end of the week and later I’ll share more of my time to play with POWER and LLVM. Then libdepixelize will have to wait a little until I can do more commits.
I’m aiming to deliver all the improvements related to the original Kopf-Lischinski algorithm before Inkscape 0.91. Later I’ll try to improve performance. “Beyond Kopf-Lischinski” extensions have no timeline and depend on creativity.
Lots of months without updates, I thought it was about time to post something.
Interestingly, this is the post that document the slowest progress, but it was the post that consumed more from my time.
The frogatto game database
I found an awesome pixel art database distributed with the Frogatto game and from now on I’ll use it on my research and tests. The license is kind of confuse (you don’t know precisely all the things you can do) and I’d prefer a Creative Commons license, but it’s safer (for me) than use copyrighted graphics from Nintendo or other companies. Not only the license is better, but also the beauty of these graphics is outstanding.
Most of the images don’t have an Alpha channel and use a placeholder color as the removable background, but there are some images where real use of Alpha channel (not only on-off) is there.
I want also to add that I liked the Frogatto game so much that I was thinking about the possibility to join forces with their developers to provide a hi-rez version of the game using the Kopf-Lischinski algorithm. Maybe I can borrow some processing power from some cluster at my university to generate the SVG files.
The Alpha channel heuristic
If you’re unfamiliar with Kopf-Lischinski heuristics, I suggest you to read the “trace pixelart” manual bundled with the very (bzr) latest version of Inkscape.
The frogatto game pixel art database made me realize one simple fact: The alpha channel heuristic to resolve similarity graph ambiguities that I was planning to develop is pointless for most of the cases, because the extra color patch will work against the heuristic. The extra color patch will see different colors and will create square-like pixels. The below image from the Frogatto database (with a red background inserted) sumarize this problem:
Do you know how a good a conversion from current libdepixelize would be? Well, it would generate a similar image as output, but a an alpha channel heuristic wouldn’t help. Also, there aren’t really any cross-connections to resolve here (because the alpha channel + extra color patch turn each one of the white pixels into different colors/shapes). Even if there was an ambiguity in the similarity graph, I don’t see how an alpha channel heuristic would help (mainly because the lack of problem hurts my imagination).
There is, although, one case where I see an improvement/extra-safety-guarantee that could be achieved through an alpha channel heuristic. Look at the magnified image below:
The image is ugly and the reason is because I created it. It’s so ugly that maybe you don’t understand what drawing I tried to achive here, then I need to explain. The drawing is a part of a chain. There were chain images on the Frogatto database, but they weren’t affected by the issue I wanted to mention.
The image squares are pixels and the image was magnified 12x. The Alpha channel information is still there. The other heuristhics (long curve and sparse pixels) will likely vote to keep the chain shape, then the result is already good and there is no need for an Alpha channel heuristic. Also, it’s possible (but very unlikely) that the transparent color has random bits that will create no connections and won’t affect the chain shape.
Well, pretty much an Alpha channel heuristic is useless, but kind of nice to have as an extra safety over the other heuristics. But I won’t create this extra safety guarantee, because the above image is artificial (you won’t find it in any game) and I haven’t find a good Alpha channel database affected by this issue yet. If I do find such database, I can reconsider this issue, because even if the heuristic go wrong for some images, the libdepixelize design allow you to disable or invert any heuristic (neat feature).
Just to be clear for once… I won’t waste my time looking for a pixel art database making good use of alpha channels affected by this issue. I don’t even bother anymore. But… if you do find such database, just share it with me and I’ll see what I can do. In fact, this is one of the changes in libdepixelize where I don’t need to invest much effort or thought (apart from the pixel art database research).
A new idea to keep the shape of optimized splines correct
One of the things that I’m really failing to achieve is to keep the shape of optimized splines correct. I had a new idea that I want to test soon and I’ll describe in the text below.
So, the idea is to create an index of all points that share the same position before the optimization begins. Every time a point is optimized, the position for all points sharing that position change. The problem is, among others, to keep invisible lines invisible.
The approach is simple, but it’s a bunch of code to do. Stay tuned.
A new competitive filter
The post is getting kind of big, then I’ll leave analysis of one possibly interesting algorithm for later.
Correctness (C++ memory-wise) tests
This was a task originally suggested by Nathan. I think he meant to use Valgrind, but I went on a different direction and I went on LLVM suite (clang sanitizers) instead.
First test is simple and it didn’t show much. This was done using clang static analyzer tool and theoretically can be as good as the compiler warnings. The output was “No bugs found.”.
Second test was the combo address sanitizer+leak detector, also using LLVM tools. I fixed some bugs regarding the use of the popt command line arguments parsing library and suppressed another warning (source outside of libdepixelize cannot be fixed within libdepixelize). Looks like the use of RAII helped to keep libdepixelize free of memory leaks and the use of standard containers instead self-made structures helped to keep libdepixelize free of wrong-address errors (although I do some weird pointer arithmetic at some places to try to improve the performance, but the analyzer hasn’t spotted errors).
Third test was the memory sanitizer, that would help to spot the use of non-initialized variables, but this sanitizer aborts on the first error and due to an error within popt I couldn’t go farther. I tried to use the suggested attributes and also the blacklist file, but it didn’t help. In short, I’d need to rewrite the binary without popt to inspect libdepixelize code through the memory sanitizer, then I’m postponing this task. At least the next planned big refactor is more sensible to addresses, which has a sanitizer working fine.
I tested all output modes (voronoi, grouped voronoi, non-smooth and default). The set of images used was chosen to increase the test of code paths executed, including the images that are used as examples in the original paper to describe the heuristics and smoothing techniques and some big images taken from real-world games just to stress the code. Something better would be some unit tests. This set of image was used for all tests described in this post.
This was also a task originally suggested by Nathan. The task was not to find the slowest element in the processing pipeline, because maybe the element is the slowest, because indeed the task is tough and it may be doing the best possible. The task was to find which processing element wasn’t scaling linearly with the size of the input and improve it.
I haven’t measured memory consumption like old Rasterman told us, but memory consumption isn’t my focus with libdepixelize. I use a templated argument to customize the precision and you can “freely” (actually it’s limited by 2geom) use any type (float, double, long double, …) you think fits your application the best. I try to avoid cache misuse and too many allocations and memory usage is just a consequence of this design principle. Thus, don’t expect any changes focused on memory consumption. Memory usage may receive some love, but just as a mean to achieve better performance. Also, this isn’t the kind of application you hope to keep around and you just want to see it finishing fast. A last thought is that structuring packing could decrease memory usage without affecting the algorithm, which can improve the use of the cache and the performance.
You can find the result below (libdepixelize profiling output + zsh + python tricks) and the method at the end of this section. The image was generated using R.
And below, you have a plot of the same data using a logarithmic scale to move away the curves from 0.
I’m sorry for being that n00b, incapable of producing a legend that don’t cover parts of the curves.
Now it’s clear what processing units I need to investigate to improve the performance.
Improving cache usage
One of the ideas to improve the performance was to use a cache oblivious algorithm. This means that I should use a different memory layout to make better use of cache. I wanna know what kind of improvement I should expect and I did a small test converting a large image to a single row and did some measurements (you can find a code snippet below as a proof), but the libdepixelize’s code take different paths and the comparassion is very unfair (to not say useless).
I will do a second approach mentioned in this blog post to measure cache misuse. I’ll have to implement the new memory layout and compare the results to be sure about the improvement. Before comparing the results, I’ll have to rerun the old tests, because my computer will be using newer libraries and newer compilers and the comparassion wouldn’t be fair.
Let’s define a run as the execution of the script below (given that libdepixelize was configured with the option OUTPUT_PROFILE_INFO enabled).
I did 3 runs and aggregated all common values per image (even equal steps in different output modes, like “Tracer::Kopf2011::_disconnect_neighbors_with_dissimilar_colors(Tracer::PixelGraph)”, with exception of “Tracer::Splines construction”, which would be unfair). I discarded the largest (the slowest) value for each field per image, because I was considering it as cold cache effect and it’s more difficult to measure/replicate this condition (I’d need to learn how to clear the cache without rebooting the PC), then I’m more interested in hot cache condition. Then, I computed the arithmetic mean. Now I have a table of values to print for a set of images. I’d like to compare the time taken by some processing element (this is how to interpret each value) as the “difficult” increases. The difficult was related to each image and to rank them I just sorted them by number of pixels.
And in case you want aggregate data like I did, I’m sorry, but I cannot help you. I used at least 3 technologies to do interactive exploration (ZSH + BASH + Python) and I cannot provide scripts that don’t exist to you. And about the other tools used, they were used just because I know how to use them (accidental vs planned) and the communication among them was rather unusual (clipboard text + simple files + json formatted files + MongoDB + CSV). The description of the method given above is just better than the tools by themself. Accept my apologies in the form of an AA cow:
I ran all the tests on this machine and I kept myself from using this machine while the tests were being executed. Yes, not a very “scientific” approach, but I think it’s on the right track and I need to reserve some time to development. The purpose is just to identify the “worst” processing element anyway.
This post is a direct reply to this Miguel de Icaza’s post, then I don’t hope it will make sense as an independent text. This post was initially an email reply, but I wanted to share it with a specific person, then I decided to migrate the text content to a blog post.
One of the most interesting parts from the Icaza’s text was the one about code reuse, but he seems to “forget” many of the problems that the solution creates:
- Service activation: Even by creating components using the proposed technique, the application still needs to face the problem to start the service, but then the application needs to know the service and the abstraction no longer works as initially desired (separation of interface and implementation).
- Inactive services: It’s an unnecessary overhead and decreases battery life to maintain inactive services running.
- Resource management: He mentions reference counting, but what if the application hangs before decrementing the reference?
- Security: This thing about global services… imagine someone creating the service “open-window-to-ask-for-password” and then having all your passwords being stolen.
- All components would need to be scalable: Imagine that instead having a function being executed 10 times by different proccesses that live in different places, now you have a daemon that executes only at one place and needs to stand against the bottleneck of 10 simultaneous clients.
There must be more points, but I don’t remember now. This part of the text reminded me about D-Bus, the systemd team (that is cooperating with the kernel to make D-Bus faster), and ROS (a robotics framework/meta-os that also provides this “messaging” system). It also reminds me about Android where there are several services (SMS provider, …) that work together with the applications, but I think (I’m not sure) that Android adds some isolation to enhance security (which is nice when you don’t trust the applications, but we usually don’t have with an easy-to-use GUI on GNU/Linux distros).
But leaving criticism apart, if he really solves all the problems, the only work left to reach the top (I suppose) would be divulgation and the creation of tools (tools/bindings for about 10 or 20 programming languages) to ease the life of developers. By “top” I mean that his technology would compete with Android/Sailfish/…, maybe would win servers’ love and so on.
And to finish the topic I don’t trust much on GNOME developers. Do something in C with minimal OO or something in C++ with no-to-only-where-necessary-overhead OO. But GObject… seriously… a bad joke.
About the problem of configuration, I think it’s less problematic now that systemd is conquering the user-land in all distros. And systemd also do some clever things like separation of default configuration (on /usr) and your costumizations that can even override only some values of the default config.
And still about the problem of configuration, I think the proposed solution to configuration is more realistic than the proposed solution to code reuse. But he still needs to solve the security problem.
Following the “7 bash tricks“, I’m now using ZSH as my main shell. The differences that I noticed from bash so far:
- Way smarter completion system.
- A transient rprompt showing information that isn’t important for scrollback and otherwise would only waste space.
- Zsh is more responsive/fast for me. I think the problem in my bash configuration was the git completion/info stuff, but Zsh provides a non-slow git completion/info alternative, then I’ll cast a vote for zsh. Although there are some users installing git modules more expensive in their zsh shells.
- A prompt char whose background color will turn red if the last command returned non-zero. And the exit status will also be printed on the output if non-zero.
- There are more supported Emacs key bindings. And they are easily customizable, then I can mimic some bash and Emacs behaviours.
Regarding the usage, Zsh and Bash aren’t too different and it’d be difficult to tell which shell is which. I type the same commands and I see (mostly) the same output. The svn to git change was way more significant to me than the bash to zsh change. Another recent change of habit was the Solarized’s colorscheme adoption (not shown in this page’s outdated gifs).
There are lots of users adopting some configuration framework like oh-my-zsh, zshuery or prezto. From the usual “look my oh-my-zsh’s based zsh” posts that I’ve read… I think these users like to configure tons of plugins… but I don’t and I don’t see much benefit of using one of these configuration frameworks. In fact, I like an initially minimalist and clean configuration and this is how I maintain my ArchLinux setup. But the main benefit of the framework-free setup is the self-contained nature. The only dependency is zsh itself.
My 239-lines zshrc config file is available online on my git repo and there isn’t much magic there. In this post I’ll only explain the small amount of magic there.
The bash’s Ctrl+C behaviour
When you press Ctrl + C in bash, the shell “abort” the edition of current line and let you try again from the beginning. The difference of behaviour in bash and zsh is that bash will also print a nice “^C” string, then you can know if the previous lines were actually executed or not. I wanted this behaviour in zsh.
I found out that zsh handle stty in a different way and you’d need to caught SIGINT to print the “^C” string. But zsh is customizable and it allow you to easily trap the INT signal.
This is trivial and far from the “magic” concept I wrote at the beginning of the post, but it’ll get complicated soon. Moving on…
The reddish background of the prompt char
After seeing so many esoteric zsh prompts, I started to ponder how I’d want my own prompt. I decided I wanted something minimalist, showing only important information and only when it is interesting. This is why I use relative paths on the prompt and the hostname is omitted if I’m not acessing the terminal from a ssh connection. But all those prompt demos from other zsh users made me wonder how useful a reddish color alert would be. And it would be possible to put the red color on the background of the prompt char, then I wouldn’t waste any extra space.
A simple way to set the prompt the way I want would be like this:
And it works, but there is one small problem. If I hit Ctrl+C, a non-zero status will be returned and the red background will be triggered, tricking me to think something bad happened. To “fix” the behaviour, I decided to add a “_trapped” variable that would would work like a masking, allowing or banning the red background. Then I need to change the previously created TRAPINT function to ban the red background and add a hook before the execution of new commands to allow the red background. The configuration ended up like this:
This change also has the nice side effect of turning the red background off easily by pressing Ctrl + C. The “updatemyprompt” function needs to be called from “TRAPINT”, because the “precmd” function isn’t always called before the prompt is redraw (see manpages). I’d prefer to control the output of the prompt var through something like the “prompt_subst” option, but I haven’t figured out a pretty way to conditionally show a string using zsh’s command and parameter expansion based on the “_trapped” value (yet!). And when I learn a prettier way to control the “_trapped” mask, I’ll still need the “precmd” function, because this is required for the “vcs_info” module.
The above code/config wasn’t the final one (vcs_info’s config is missing) and let’s stop here, because there is no more magic to show. The code end up being ugly, with responsibilities not well defined and error-prone communication system (a single global variable caused such a big damage here). The upside is that this is my shell, it does what I want and therefore I won’t change it. Also, in fact, my real zsh config uses anonymous function to leak less variables and avoid unnecessary naming clash.
Custom keys detection
It turns out that terminals are hard. There are escape sequences that were created to interact with the terminal. The interaction would allow applications move the cursor around and detect certain key combinations done by the user. Unfortunately, there are several protocols and there is no universal way to access these features. To register some fancy key bindings, zsh wiki and several online pages suggest you to use terminfo or zkbd.
The zkbd method requires an annoying manual intervention on the first start-up and it would be desired as a fallback mode only.
The terminfo mode will only work under application mode and the zsh wiki suggest you to activate the application mode on the beginning of the line edition and deactivate it on the end, before the execution of other applications. The problem appears when the terminal doesn’t support application mode. Fortunately, there is a a way to check if the underlying terminal supports application mode, then you can you can even add a nicely integrated zkbd fallback mode. Below there is an initial snippet to configure zsh accordingly.
Note that I initially avoided the terminfo all together because I thought the lack of info around “go to application mode” too scary, resembling some “error-prone thing that will break”. I needed to understand that zsh would leave application mode before execute other applications and it would only enter in application mode if supported. I learned all these bits thanks to comments in this prezto’s pull request.