Weissman Score Calculation

AceInfinity · Jan 29, 2015

Code:

[NO-PARSE]/*
* Author: AceInfinity
* Date:   2015-01-29 16:47:45
* File:   main.c
* Last Modified time: 2015-01-29 17:46:40
*/

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

struct weissman_t
{
  double ratio;            // compression ratio
  double time_to_compress; // time to finish compression
};

double /* weissman score */
calc_weissman_score(const double a,
                    const struct weissman_t *target,
                    const struct weissman_t *universal)
{
  return a * (target->ratio / universal->ratio)
         * (log10(universal->time_to_compress) / log10(target->time_to_compress));
}

void
print_weissman(const struct weissman_t *w)
{
  printf("Compression Radio: %f\n"
         "Time-to-Compress: %f seconds\n",
         w->ratio, w->time_to_compress);
}

int main(void)
{
  /*
  * W = a r1   log T2
  *       -- • ------
  *       r2   log T1
  *
  * r1 and T1 refer to the compression ratio and time-to-compress
  * for the target algorithm. r2 and T2 refer to the same quantifies
  * for the standard universal compressor (e.g. gzip or FLAC), and a
  * is a scaling constant. By normalizing the performance
  * of a standard compressor, we take away variation in compressive
  * performance between types of data.
  */

  double a = 1;
  struct weissman_t target    = { .ratio = 0.20, .time_to_compress = 1.2 };
  struct weissman_t universal = { .ratio = 0.95, .time_to_compress = 20.0 };

  printf("Target Weissman Data:\n"); print_weissman(&target); printf("\n");
  printf("Universal Weissman Data:\n"); print_weissman(&universal); printf("\n");

  double result = calc_weissman_score(a, &target, &universal);
  printf("Result Score: %f\n", result);
}[/NO-PARSE]

If anyone has seen Silicon Valley they would know what the Weissman score is.

Supposedly this number was created solely for the purpose of the show to more easily demonstrate the power of a compression algorithm to viewers who know absolutely nothing about compression of data. This was never (fundamentally) a standard number, but as it reached the academic world of computer science, it just might! The principle behind the formula is actually quite nice, and was developed by a PhD student and a professor. That same PhD student is now graduated and making their way to work for IBM on the Watson project.

Obviously I just threw in test numbers here, but to do the accurate thing, you would take a gzip compression algorithm and allow it to compress original data while taking the time to compress and compression ratio, and comparing it to a third party compression algorithm for which the same relative data (compression time, and compression ratio) is known.

Weissman Score Calculation

AceInfinity

Emeritus, Contributor