Parallel Graph Processing Frameworks + How DRAM Works

Previous | Next --- Slide 47 of 81

gmudel

Is it common practice to re-order vertex ids if they appear frequently in our edge list? I.e. if vertex 10000 has degree 100 and vertex 1 has degree 1, can we give vertex 10000 an "alias id" of 1 and give vertex 1 an "alias id" of 10000 to save space in our compression?

abraoliv

Clearly this scheme only compresses the edge list compared to using the same size integer type for each vertex index. Are there more complex / mathematically interesting compression schemes commonly in use, like one derived from number or information theory? In the end, we could ignore what makes graph data unique and compress it with regular data compression algorithms, but we lose domain-specific optimizations.

rahulahoop

Seems like this encoding is highly dependent on your node ordering. Are there optimal node ordering schemes, or do you typically assign numbers to nodes via BFS?

rahulahoop

Also how do you encode -27 in just 1 byte? Would you use something besides two's complement?

kai

It seems like some encoding span multiple cache lines which could decrease the arithmetic intensity of our code .Is there some way that we can ensure headers and encodings stay on the same cache line?

sirej

@rahulahoop I don't think it depends on node ordering. The adjacency list is stored essentially as a sequence of [header, data], where data holds the deltas of the sorted edges. This is sorted and the deltas here are optimally small. At the top of the slide it mentions that this is the adjacency list for just node 32. I assume that to store the entire list, this process would be repeated and thus reordering nodes in this list would not change the size.

And with 1 byte (int8), you can represent -128 to 127 with two's complement, so that would be sufficient to represent -27.

Please log in to leave a comment.