Quantcast
Channel: AMD Developer Forums: Message List
Viewing all articles
Browse latest Browse all 4104

Re: Re: Cuckoo hashing in OpenCL

$
0
0

himanshu.gautam wrote:

 

Looks like your sample needs boost to compile. Any more surprises, in case i try to compile it?

The code anyways looks complex, as i have no idea what cuckoo is. Maybe you can create a smaller testcase, which is easy to compile and run by other developers.

I decided to use boost (1.46.1) because it's cross platform and makes my life easy with strings and random numbers. It would be difficult to replace it with something simpler and still cross platform.

There won't be any more surprises I believe.

The code is pretty much the small testcase, so I'll try to explain it a bit

 

What we want

We have an array of (key, value) pairs and we want to store them in a way that is fast to retrieve a specific pair.

On this piece of code we focus on building the hash table, not retrieval.

 

What we do

We use a two-level hashing scheme, with the first level shuffling the input pairs and the second implementing the cuckoo hashing. The first 3 kernels is the first part, the last kernel is the second part.

In the first part, all threads work together, so if one fails, all of them restart. In the second part, each workgroup takes one part of the shuffled data and works independently, so if one thread fails, all threads in this workgroup restart (will understand later...).

Right now, we will ignore the first part and focus on the cuckoo hashing.

 

Theory

Cuckoo hashing is a dynamic hashing procedure, which means that the position of its key on the hash table is not based on a deterministic function, but a probabilistic one.

The whole hash table is broken into a number of subtables, 3 in our case (SUBTABLES in my code).

On the serial version of the algorithm, its key draws a random number and, based on that, tries to enter its value on the first subtable. If another key has already entered its value on this position, it draws another random number and tries again on the next subtable.

The procedure continues until all pairs have been written in an empty location.

If a pair hasn't managed to get into a subtable, the hash table is destroyed and the procedure restarts with different seeds for the random numbers. Hopefully, after a number of attempts, the table will be built...

Note that the hash table is bigger than the input, in order to minimize the conflicts. For example, if we have 100 pairs, the hash table will have size 100(1+gamma) pairs, where 0<gamma<1. That means that in the end the table will have some empty pairs.

In order to retrieve the pair, we also need to store the random numbers used when building the table.

 

GPU Cuckoo

Now on the parallel gpu version, the cuckoo hashing is performed inside a workgroup. Each workgroup initializes in local memory a hash table (hltables) with (key,value)=(MAX_UINT, MAX_UINT), where MAX_UINT=0xffffffff.

Instead of having 3 subtables, I keep one (hltables) and move to the right index by calculating the offset of each subtable

cur_ofst = tries * subtable_size; // The index at each subtable

Then I calculate the index at the subtable based on the function given in the algorithm and add it to the cur_ofst

bucket_id = ((newRandoms[tries].x + newRandoms[tries].y * value_in) % PRIME) % subtable_size;

cur_ofst += bucket_id; // Update the offset with the index in the subtable

After that, each thread enters its value to the subtable, waits at the local barrier and then checks if its value managed to remain at the table. If not, it moves to the next subtable

hltables[cur_ofst].x = value_in; // Insert your value and hope no one else overwrites it!

barrier(CLK_LOCAL_MEM_FENCE); // Synchronize all the workgroup threads so that the following read makes sense

value_out = hltables[cur_ofst].x; // Now check if your value was actually inserted

If all goes well, all threads have entered at one subtable their pair. If not, the threads that failed signal the rest with the variable alert and the cuckoo hashing restarts

for(attempts=0; attempts<MAX_ATTEMPTS; attempts++) {

...

barrier(CLK_LOCAL_MEM_FENCE);

if(!alert) break; // if nobody has alerted failure, break and save the hashtable

}

In the end, each workgroup copies the hash table it built from local memory to global and saves the seeds for the random numbers that built the table.

 

Problem

I write the Cuckoo hash table in a file for checking.

This table should contain unique values (except for those that are still 0xffffffff) in the range [0, num_uniqs-1] but doesn't. it looks like some of them are written in contiguous positions and some are overwritten.

 

References

On the page I link in my first post, there is a dissertation and a paper.

In the dissertation, there is an implementation for cuda in pages 68-69.

In the paper, there is a description of the algorithm in page 4 under paragraph Phase 2.

 

 

Thank you for the interest,

Andrew Paschos


Viewing all articles
Browse latest Browse all 4104

Trending Articles