libdepixelize update

Lots of months without updates, I thought it was about time to post something.

Interestingly, this is the post that document the slowest progress, but it was the post that consumed more from my time.

The frogatto game database

frogatto-icon

I found an awesome pixel art database distributed with the Frogatto game and from now on I’ll use it on my research and tests. The license is kind of confuse (you don’t know precisely all the things you can do) and I’d prefer a Creative Commons license, but it’s safer (for me) than use copyrighted graphics from Nintendo or other companies. Not only the license is better, but also the beauty of these graphics is outstanding.

Most of the images don’t have an Alpha channel and use a placeholder color as the removable background, but there are some images where real use of Alpha channel (not only on-off) is there.

I want also to add that I liked the Frogatto game so much that I was thinking about the possibility to join forces with their developers to provide a hi-rez version of the game using the Kopf-Lischinski algorithm. Maybe I can borrow some processing power from some cluster at my university to generate the SVG files.

The Alpha channel heuristic

If you’re unfamiliar with Kopf-Lischinski heuristics, I suggest you to read the “trace pixelart” manual bundled with the very (bzr) latest version of Inkscape.

The frogatto game pixel art database made me realize one simple fact: The alpha channel heuristic to resolve similarity graph ambiguities that I was planning to develop is pointless for most of the cases, because the extra color patch will work against the heuristic. The extra color patch will see different colors and will create square-like pixels. The below image from the Frogatto database (with a red background inserted) sumarize this problem:

glow

Do you know how a good a conversion from current libdepixelize would be? Well, it would generate a similar image as output, but a an alpha channel heuristic wouldn’t help. Also, there aren’t really any cross-connections to resolve here (because the alpha channel + extra color patch turn each one of the white pixels into different colors/shapes). Even if there was an ambiguity in the similarity graph, I don’t see how an alpha channel heuristic would help (mainly because the lack of problem hurts my imagination).

There is, although, one case where I see an improvement/extra-safety-guarantee that could be achieved through an alpha channel heuristic. Look at the magnified image below:

chain

The image is ugly and the reason is because I created it. It’s so ugly that maybe you don’t understand what drawing I tried to achive here, then I need to explain. The drawing is a part of a chain. There were chain images on the Frogatto database, but they weren’t affected by the issue I wanted to mention.

The image squares are pixels and the image was magnified 12x. The Alpha channel information is still there. The other heuristhics (long curve and sparse pixels) will likely vote to keep the chain shape, then the result is already good and there is no need for an Alpha channel heuristic. Also, it’s possible (but very unlikely) that the transparent color has random bits that will create no connections and won’t affect the chain shape.

Well, pretty much an Alpha channel heuristic is useless, but kind of nice to have as an extra safety over the other heuristics. But I won’t create this extra safety guarantee, because the above image is artificial (you won’t find it in any game) and I haven’t find a good Alpha channel database affected by this issue yet. If I do find such database, I can reconsider this issue, because even if the heuristic go wrong for some images, the libdepixelize design allow you to disable or invert any heuristic (neat feature).

Just to be clear for once… I won’t waste my time looking for a pixel art database making good use of alpha channels affected by this issue. I don’t even bother anymore. But… if you do find such database, just share it with me and I’ll see what I can do. In fact, this is one of the changes in libdepixelize where I don’t need to invest much effort or thought (apart from the pixel art database research).

A new idea to keep the shape of optimized splines correct

One of the things that I’m really failing to achieve is to keep the shape of optimized splines correct. I had a new idea that I want to test soon and I’ll describe in the text below.

So, the idea is to create an index of all points that share the same position before the optimization begins. Every time a point is optimized, the position for all points sharing that position change. The problem is, among others, to keep invisible lines invisible.

The approach is simple, but it’s a bunch of code to do. Stay tuned.

A new competitive filter

The post is getting kind of big, then I’ll leave analysis of one possibly interesting algorithm for later.

Correctness (C++ memory-wise) tests

This was a task originally suggested by Nathan. I think he meant to use Valgrind, but I went on a different direction and I went on LLVM suite (clang sanitizers) instead.

First test is simple and it didn’t show much. This was done using clang static analyzer tool and theoretically can be as good as the compiler warnings. The output was “No bugs found.”.

Second test was the combo address sanitizer+leak detector, also using LLVM tools. I fixed some bugs regarding the use of the popt command line arguments parsing library and suppressed another warning (source outside of libdepixelize cannot be fixed within libdepixelize). Looks like the use of RAII helped to keep libdepixelize free of memory leaks and the use of standard containers instead self-made structures helped to keep libdepixelize free of wrong-address errors (although I do some weird pointer arithmetic at some places to try to improve the performance, but the analyzer hasn’t spotted errors).

Third test was the memory sanitizer, that would help to spot the use of non-initialized variables, but this sanitizer aborts on the first error and due to an error within popt I couldn’t go farther. I tried to use the suggested attributes and also the blacklist file, but it didn’t help. In short, I’d need to rewrite the binary without popt to inspect libdepixelize code through the memory sanitizer, then I’m postponing this task. At least the next planned big refactor is more sensible to addresses, which has a sanitizer working fine.

I tested all output modes (voronoi, grouped voronoi, non-smooth and default). The set of images used was chosen to increase the test of code paths executed, including the images that are used as examples in the original paper to describe the heuristics and smoothing techniques and some big images taken from real-world games just to stress the code. Something better would be some unit tests. This set of image was used for all tests described in this post.

Performance tests

This was also a task originally suggested by Nathan. The task was not to find the slowest element in the processing pipeline, because maybe the element is the slowest, because indeed the task is tough and it may be doing the best possible. The task was to find which processing element wasn’t scaling linearly with the size of the input and improve it.

I haven’t measured memory consumption like old Rasterman told us, but memory consumption isn’t my focus with libdepixelize. I use a templated argument to customize the precision and you can “freely” (actually it’s limited by 2geom) use any type (float, double, long double, …) you think fits your application the best. I try to avoid cache misuse and too many allocations and memory usage is just a consequence of this design principle. Thus, don’t expect any changes focused on memory consumption. Memory usage may receive some love, but just as a mean to achieve better performance. Also, this isn’t the kind of application you hope to keep around and you just want to see it finishing fast. A last thought is that structuring packing could decrease memory usage without affecting the algorithm, which can improve the use of the cache and the performance.

You can find the result below (libdepixelize profiling output + zsh + python tricks) and the method at the end of this section. The image was generated using R.

A plot from the measurements taken

Plot

And below, you have a plot of the same data using a logarithmic scale to move away the curves from 0.

A second plot from the measurements taken

Plot with logarithmic scale

I’m sorry for being that n00b, incapable of producing a legend that don’t cover parts of the curves.

Now it’s clear what processing units I need to investigate to improve the performance.

Improving cache usage

One of the ideas to improve the performance was to use a cache oblivious algorithm. This means that I should use a different memory layout to make better use of cache. I wanna know what kind of improvement I should expect and I did a small test converting a large image to a single row and did some measurements (you can find a code snippet below as a proof), but the libdepixelize’s code take different paths and the comparassion is very unfair (to not say useless).

// This software assumes that all input files have 4 channels (RGB + Alpha)
// and the output.png file only has one line and the total number of pixels in
// both files is equal.
// Use GIMP to create such files.
#include <gdkmm/pixbuf.h>
#include <glibmm/init.h>
#include <gdkmm/wrap_init.h>
#include <cstdlib>
int main(int, char const *[])
{
Glib::init();
Gdk::wrap_init();
Glib::RefPtr<Gdk::Pixbuf> input
= Gdk::Pixbuf::create_from_file("/home/vinipsmaker/Downloads/input.png");
Glib::RefPtr<Gdk::Pixbuf> output
= Gdk::Pixbuf::create_from_file("/home/vinipsmaker/Downloads/output.png");
guint8 *input_iterator = input->get_pixels();
guint8 *output_iterator = output->get_pixels();
const int input_rowpadding
= input->get_rowstride() - input->get_width() * 4;
for ( int i = 0 ; i != input->get_height() ; ++i ) {
for ( int j = 0 ; j != input->get_width() ; ++j ) {
for ( int k = 0 ; k != 4 ; ++k )
output_iterator[k] = input_iterator[k];
input_iterator += 4;
output_iterator += 4;
}
input_iterator += input_rowpadding;
}
output->save("/home/vinipsmaker/Downloads/output2.png", "png");
return EXIT_SUCCESS;
}

I will do a second approach mentioned in this blog post to measure cache misuse. I’ll have to implement the new memory layout and compare the results to be sure about the improvement. Before comparing the results, I’ll have to rerun the old tests, because my computer will be using newer libraries and newer compilers and the comparassion wouldn’t be fair.

Method

Let’s define a run as the execution of the script below (given that libdepixelize was configured with the option OUTPUT_PROFILE_INFO enabled).

#!/usr/bin/env zsh
BIN="/home/vinicius/Projetos/libdepixelize/build/src/depixelize-kopf2011/depixelize-kopf2011"
ARGS=("-o" "/dev/null")
# The first run at each step only warms the cache
for file in *.png; do
out="${file/.png}.txt"
echo ${file} > ${out}
echo >> ${out}
echo Voronoi: >> ${out}
${BIN} ${ARGS} ${file} -v 2> /dev/null
${BIN} ${ARGS} ${file} -v 2>> ${out}
echo >> ${out}
echo Grouped Voronoi: >> ${out}
${BIN} ${ARGS} ${file} -g 2> /dev/null
${BIN} ${ARGS} ${file} -g 2>> ${out}
echo >> ${out}
echo Non-smooth: >> ${out}
${BIN} ${ARGS} ${file} -n 2> /dev/null
${BIN} ${ARGS} ${file} -n 2>> ${out}
echo >> ${out}
echo Smooth: >> ${out}
${BIN} ${ARGS} ${file} 2> /dev/null
${BIN} ${ARGS} ${file} 2>> ${out}
done
view raw script.zsh hosted with ❤ by GitHub

I did 3 runs and aggregated all common values per image (even equal steps in different output modes, like “Tracer::Kopf2011::_disconnect_neighbors_with_dissimilar_colors(Tracer::PixelGraph)”, with exception of “Tracer::Splines construction”, which would be unfair). I discarded the largest (the slowest) value for each field per image, because I was considering it as cold cache effect and it’s more difficult to measure/replicate this condition (I’d need to learn how to clear the cache without rebooting the PC), then I’m more interested in hot cache condition. Then, I computed the arithmetic mean. Now I have a table of values to print for a set of images. I’d like to compare the time taken by some processing element (this is how to interpret each value) as the “difficult” increases. The difficult was related to each image and to rank them I just sorted them by number of pixels.

And in case you want aggregate data like I did, I’m sorry, but I cannot help you. I used at least 3 technologies to do interactive exploration (ZSH + BASH + Python) and I cannot provide scripts that don’t exist to you. And about the other tools used, they were used just because I know how to use them (accidental vs planned) and the communication among them was rather unusual (clipboard text + simple files + json formatted files + MongoDB + CSV). The description of the method given above is just better than the tools by themself. Accept my apologies in the form of an AA cow:

_______________________________________
/ ZSH + BASH + C++ + Python + MongoDB + \
\ CMake + csv + R + SVG /
---------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
view raw aa.txt hosted with ❤ by GitHub

I ran all the tests on this machine and I kept myself from using this machine while the tests were being executed. Yes, not a very “scientific” approach, but I think it’s on the right track and I need to reserve some time to development. The purpose is just to identify the “worst” processing element anyway.

Tags:,

3 Respostas para “libdepixelize update”

  1. William Cohen diz :

    If you are looking at packing issues for data structures, you might look at the pahole program in the dwarves package available in fedora. It can show whether the data structure layout could be improved. http://lwn.net/Articles/255364/

    • Vinipsmaker diz :

      Your link has a lot more than a simple reference to the pahole software. Thanks for sharing it. I’ll need a bit of time to “eat” it, but I think it’ll be worth.

Trackbacks / Pingbacks

  1. Another libdepixelize update | Vinipsmaker labs - 2014/04/01

Deixar mensagem para Vinipsmaker Cancelar resposta