Class Histogram

java.lang.Object
net.algart.arrays.Histogram
Direct Known Subclasses:
SummingHistogram

public abstract class Histogram extends Object

Histogram: an array of non-negative integer numbers b[v], 0≤v<M, where every element b[v] represents the number of occurrence of the value v in some source array A, consisting of integer elements in 0..M−1 range. The integer values v in the source array are 31-bit: 0≤M<231 (int Java type). The elements (bars) of the histogram b[v] and also the total number of elements in the source array N are 63-bit: 0≤N<263 (long Java type). The source array A is always supposed to be sorted in increasing order: A[0]≤A[1]≤...≤A[N−1], where N is the number of elements in A (obviously, N is equal to the sum of all bars b[0]+b[1]+...+b[M−1]).

This class is designed for solving 2 basic tasks:

  1. to find the value v=A[r], if we know its index r in the sorted array A: this value is called the percentile #r (well-known special case is the median, when r=N/2);
  2. to find the index r in the sorted array A, if we know the corresponding value v=A[r]: this value is called the rank of the value v.

It's important that this class does not store and does not try to sort the source array A, it stores only the histogram b and solves both tasks on the base of it.

According the simple definition above, the rank and the percentile are integer numbers. However, this class works not with integer, but with real precision, using some generalization of the integer rank and percentile to the floating-point case. Below is the formal definition of the real rank and percentile.

Definition of floating-point percentile v(r) and rank r(v)

Let b[0..M−1] be an array of non-negative integer numbers, called the histogram (and stored by this class), and let N be the sum of all these elements (the length of the supposed, but not stored source sorted array A). In addition, we suppose that there is an additional "virtual" element b[M], which is always zero. The elements b[k] are called the bars of the histogram.

This class operates with the function v(r), called the percentile, and its inverse function v(r), called the rank. Both variables r and v are real (floating-point) numbers in the following ranges: 0.0≤rN, 0.0≤vM. The definition of both functions depends on so called histogram model. This class supports two following models.

Simple histogram model

In this case, r(v) is a simple polyline generalization of the integer function: while v is increasing from some integer v0 to the next integer v0+1 (some bar of the histogram), the rank r(v) is either the constant if b[v0]=0, or uniformly increased from some r0 to r0+b[v0] if b[v0]>0. More precisely:

  1. v(r) = v0 + (rr0)/b[v0], where v0 is the minimal integer value so that b[0]+b[1]+...+b[v0]>⌊r⌋ (here and below ⌊x⌋ means the integer part of x or (long)x for our non-negative numbers), and r0 = this sum−b[v0] = b[0]+b[1]+...+b[v0−1]. In the only special case r=N this formula is not correct, because the sum of histogram bars b[k] cannot be greater than the sum of all bars N, and we consider, by definition, that v(N) = limxN+1v(x) = ⌊v(N−1)⌋+1.
    Note: according this definition, v(0) = min (kZ: b[k]>0) and v(N) = ⌊v(N−1)⌋+1 = max (kZ: b[k]>0)+1.
    In a case N=0 (empty histogram), v(r) functions is considered to be undefined.
     
  2. r(v) = r0 + (vv0)*b[v0], where v0=⌊v⌋ and r0 = b[0]+b[1]+...+b[v0−1].
    Note: according this definition, r(v)=0 when v<v(0) and r(v)=N when v>v(N). In particular, r(0)=0 and r(M)=N (remember that b[M]=0 by definition).
    Unlike v(r), r(v) function is defined also if N=0: in this case it is the zero constant.

It's easy to see that the r(v) function is continuous, but, unfortunately, the v(r) function is discontinuous if some bars b[k] are zero. This behaviour is not too good for calculating percentiles on sparse histograms, when a lot of bars are zero. In this case, this generalization to floating-point almost does not improve the precision of the calculated percentile.

To better understand the sense of this model, please also read the comments to value(long[], double) method.

Precise histogram model

Here v(r) and r(v) are also polyline functions, and they are identical to these function of the simple model in all points where the rank is integer (rZ). However, if the rank is not integer, we consider, by definition, that v(r) is uniformly increased from v(r0) to v(r0+1), where r0=⌊r. In other words, we interpolate the percentile v(r0) between integer ranks. More precisely:

  1. v(r) =
    • v0 + (rr0)/b, where v0 is the minimal integer value so that b[0]+b[1]+...+b[v0]>⌊r, r0 = b[0]+b[1]+...+b[v0−1], b = b[v0] — if b>1 and r0rr0+b−1 = r0';
    • v0' + (rr0')*(v1v0'), where v0' = v0+(b−1)/b = v(r0') and v1 = min (kZ: k>v0 & b[k]>0) = v(r0'+1) is the next non-zero histogram bar — if r0'≤rr0'+1=r1. In a case r>N−1, there is no the next non-zero bar and we consider, by definition, that v1 = v0+1.
    As in the simple model, in the special case r=N we consider, by definition, that v(N) = limxN+1v(x) = ⌊v(N−1)⌋+1.
    As in the simple model, here v(0) = min (kZ: b[k]>0) and v(N) = ⌊v(N−1)⌋+1 = max (kZ: b[k]>0)+1.
    In a case N=0 (empty histogram), v(r) functions is considered to be undefined.
     
  2. r(v) is defined as the inverse function for v(r) if v(0)≤vv(N); outside this range, we consider r(v)=0 when v<v(0) and r(v)=N when v>v(N). As in the simple model, in the special case N=0 we consider, by definition, that r(v)=0 for any v.

In this model both functions v(r) and r(v) are increasing and continuous. But calculations are more complicated. The difference appears if there are empty bars (b[k]=0); namely, in each non-empty bar b[v0]=b, followed by empty bars, there is new salient point of the polyline r(v): v = v0' = v0+(b−1)/b, and after it the rank r(v) is increasing until the next rank r(v0)+b during all zero bars following after b[v0]. This feature does not appear at the right end of the histogram, if all bars following after b[v0] are zero.

Note: the traditional definition of the median of the source array A with even length N=2n is (A[n−1]+A[n])/2 (if we suppose A[0]≤A[1]≤...≤A[N−1]). This definition is identical to the percentile v(n−0.5) in the precise histogram model, if all bars contains 0 or 1 elements (all b[k]≤1).

To better understand the sense of this model, please also read the comments to preciseValue(long[], double) method.

This class is optimized for the case, when we already know some corresponding pair r (rank) and v (percentile), and we need to slightly change the situation: add or remove several A elements, then, maybe, little increase or decrease the known rank r or the value v and, after this, quickly find (recalculate) the second element of the pair: correspondingly the percentile v or the rank r. To do this, this class provides methods include and exlcude, allowing to increment or decrement elements of b array (this action corresponds to adding or removing elements to/from the source array A), and supports the concept of the current rank and current value (percentile) with a necessary set of methods for setting and getting them. Usually all these method work quickly (O(1) operations), so every time when you correct the histogram (add/remove elements) or change the current rank or the value, you can immediately read the second element of the pair (correspondingly the percentile or the rank).

More precisely, in every object of this class, at any moment, there are:

The methods include and exlcude do not change the current value, but can change the current simple and precise ranks.

This class guarantees that for any v we have rS=r(v) in terms of the simple histogram model and rP=r(v) in terms of the precise histogram model. In particular, in the case N=0 both ranks are always zero: rS=rP=0.

This class guarantees that for any rS we have v=v(rS) in terms of the simple histogram model, excepting extreme values rS=0 and rS=N and also rS values where v(rS) function is discontinuous in this model. If such situation takes place after moveToValue(double) call, the current value is equal to the argument of this method. If such situation takes place after moveToRank(double) call, the current value is equal to the v(rS) according the definition of v(r) function in the simple histogram model. In particular, after the call moveToRank(0) the current value v will be equal to v(0) and after the call moveToRank(N) the current value v will be equal to v(N). The case N=0 is processed in a special way: see comments to moveToRank(double) method.

This class guarantees that for any rP we have v=v(rP) in terms of the precise histogram model, excepting extreme values rP=0 and rP=N. If rP=0, the corresponding value v can be any number in range 0..v(0) if you set it by moveToValue(double) method; but after the call moveToPreciseRank(0) the current value v will be equal to v(0). If rP=N, the corresponding value v can be any number in range v(N)..M if you set it by moveToValue(double) method; but after the call moveToPreciseRank(N) the current value v will be equal to v(N). The case N=0 is processed in a special way: see comments to moveToPreciseRank(double) method.

There are additional methods for setting integer values and ranks: moveToIValue(int) and moveToIRank(long). They are strictly equivalent to floating-point methods moveToValue(double) and moveToRank(double), called with the same argument, but work little faster. There is no special integer version of moveToPreciseRank(double) method, because for integer ranks the simple and the precise histogram models are identical.

There are additional methods for getting integer values and ranks: currentIValue() and currentIRank(). The currentIValue() method returns the current value, rounded to an integer. The rules of rounding are complicated enough and described in the comments to this method; depending on the situation, it can work as rounding to the nearest integer (Math.round) or as truncating (Math.floor). In any case, there is a guarantee that currentIValue()−0.5≤v<currentIValue()+1. The currentIRank() method return the rank, corresponding to w=currentIValue(): it is equal to r(⌊w⌋) (this rank is integer and does not depend on the histogram model).

You can create an instance of this class by the following methods:

After creation, the only way to change the bars b[k] is using include(int), exclude(int), include(int...) and exclude(int...) methods. After creation, the current value v and both current ranks rS and rP are zero.

Sometimes you need calculating (and supporting) not one, but several pairs "current value + current rank" in the same histogram. If is possible to use a single object of this class and move from one pair to another, but if the necessary values are not close to each other, the moveTo... methods work relatively slowly. In this case, you can share the histogram b[k] between several instance of this class by share() method. The sharing instances work with the same b[k] array: any modification by include / exclude methods are immediately reflected in all shared instances. But each sharing instance has an independent set of current value, current simple rank and current precise rank. You can get all sharing instances by nextSharing() method.

This class also provides static methods for calculating v(r) function: value(long[] histogram, double rank), value(int[] histogram, double rank) for the simple histogram model, preciseValue(long[] histogram, double rank), preciseValue(int[] histogram, double rank) for the precise histogram model. These methods can be useful if you need to process the given histogram only once.

There is no guarantee that the same results, got by different ways (for example, by static methods and by creating an instance of this class and using its methods) are absolutely identical: little mismatches in the last digits after the decimal point are possible.

This class does not implement own equals and hashCode methods. So, this class does not provide a mechanism for comparing different histograms.

This class is not thread-safe, but is thread-compatible and can be synchronized manually, if multithread access is necessary.

Author:
Daniel Alievsky
  • Method Summary

    Modifier and Type
    Method
    Description
    abstract long
    bar(int value)
    Returns the bar #value of the histogram: b[value].
    abstract long[]
    Returns all bars of the histogram.
    static void
    clear(int[] histogram, int sumOfColumns)
    Fills all non-zero elements of histogram by 0.
    abstract long
    Returns the rank r(w) of the value w=currentIValue().
    final int
    Returns the current value v, rounded to an integer number.
    final double
    Returns the current precise rank: rP=r(v) in terms of the precise histogram model.
    final double
    Returns the current simple rank: rS=r(v) in terms of the simple histogram model.
    final double
    Returns the current value v.
    abstract void
    exclude(int value)
    Decrements the bar #value of the histogram by 1: b[value]--.
    abstract void
    exclude(int... values)
    Equivalent to a simple loop of calls of exclude(int) method for all passed values.
    abstract void
    include(int value)
    Increments the bar #value of the histogram by 1: b[value]++.
    abstract void
    include(int... values)
    Equivalent to a simple loop of calls of include(int) method for all passed values.
    static int
    iPreciseValue(int[] histogram, double rank)
    Precise equivalent of iPreciseValue(long[], double) for a case of int[] type of the histogram.
    static int
    iPreciseValue(long[] histogram, double rank)
    "Interpolated" version of iValue(long[], long), rounded to the "best" integer result.
    static int
    iValue(int[] histogram, long rank)
    Precise equivalent of iValue(long[], long) for a case of int[] type of the histogram.
    static int
    iValue(long[] histogram, long rank)
    Returns the element with the given index rank in the sorted array of integer numbers 0..histogram.length-1, corresponding to this histogram.
    final boolean
    Returns true if and only if the current bar is zero: b[⌊v⌋]=0 (v is the current value) and all bars from the left are also zero: b[k]=0 for all k<⌊v⌋.
    final boolean
    Returns true if and only all bars from the left of the current bar are zero: b[k]=0 for all k<⌊v⌋ (v is the current value).
    final int
    Returns the number M of bars of the histogram.
    abstract Histogram
    moveToIRank(long rank)
    Sets the current simple rank rS and precise rank rP to be equal of the rank argument.
    abstract Histogram
    moveToIValue(int value)
    Sets the current value v to be equal of the value argument.
    moveToPreciseRank(double rank)
    Sets the current precise rank rP to be equal of the rank argument.
    abstract Histogram
    moveToRank(double rank)
    Sets the current simple rank rS to be equal of the rank argument.
    moveToValue(double value)
    Sets the current value v to be equal of the value argument.
    static Histogram
    newIntHistogram(int[] histogram, int... bitLevelsOfPyramid)
    Creates new 32-bit histogram, consisting of M=histogram.length bars, equal to elements of the given array.
    static Histogram
    newIntHistogram(int histogramLength, int... bitLevelsOfPyramid)
    Creates new 32-bit histogram, consisting of M=histogramLength empty bars.
    static Histogram
    newLongHistogram(int histogramLength, int... bitLevelsOfPyramid)
    Creates new histogram, consisting of M=histogramLength empty bars.
    static Histogram
    newLongHistogram(long[] histogram, int... bitLevelsOfPyramid)
    Creates new histogram, consisting of M=histogram.length bars, equal to elements of the given array.
    abstract Histogram
    Returns the next instance of this class, sharing the histogram array b[k] with this instance.
    final boolean
    Returns true if and only if the current bar is zero: b[⌊v⌋]=0 (v is the current value) and either all bars from the left, or all bars from the right are also zero.
    static double
    percentile(int[] histogram, long sumOfColumns, double percentileLevel)
    Equivalent to preciseValue(histogram,percentileLevel*(sumOfColumns-1)), if sumOfColumns>0.
    static double
    percentile(long[] histogram, long sumOfColumns, double percentileLevel)
    Equivalent to preciseValue(histogram,percentileLevel*(sumOfColumns-1)), if sumOfColumns>0.
    static double
    preciseValue(int[] histogram, double rank)
    Precise equivalent of preciseValue(long[], double) for a case of int[] type of the histogram.
    static double
    preciseValue(long[] histogram, double rank)
    "Interpolated" version of iValue(long[], long).
    final boolean
    Returns true if and only if the current bar is zero: b[⌊v⌋]=0 (v is the current value) and all bars from the right are also zero: b[k]=0 for all k>⌊v⌋.
    final boolean
    Returns true if and only all bars from the right of the current bar are zero: b[k]=0 for all k>⌊v⌋ (v is the current value).
    abstract Histogram
    Creates new instance of this class, which uses the same arrays of bars b[k].
    abstract long
    Returns the number of instances of this class, sharing the histogram array b[k] with this instance.
    static int
    sumOf(int[] histogram)
    Returns the sum of all elements of the passed histogram array.
    static long
    sumOf(long[] histogram)
    Returns the sum of all elements of the passed histogram array.
    abstract long
    Returns the sum N of all bars b[k] of the histogram.
    static double
    value(int[] histogram, double rank)
    Precise equivalent of value(long[], double) for a case of int[] type of the histogram.
    static double
    value(long[] histogram, double rank)
    Floating-point version of iValue(long[], long).

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • newLongHistogram

      public static Histogram newLongHistogram(int histogramLength, int... bitLevelsOfPyramid)
      Creates new histogram, consisting of M=histogramLength empty bars. In other words, all bars b[k]=0 at first; but they can be increased by include(int) method.

      The bitLevelsOfPyramid argument is used for optimization of large histograms, consisting of thousands or millions bars. Namely, this class automatically builds and supports a pyramid of histograms: m=bitLevelsOfPyramid.length additional arrays b1[k], b2[k], ..., bm[k], where

      bq[k] = Σ 2s*kj < min(2s*(k+1), M) b[j], s=bitLevelsOfPyramid[q−1];
      bq.length = ⌊M/2s⌋ = histogramLength >> bitLevelsOfPyramid[q−1].

      In other words, every "sub-histogram" bq consists of "wide" bars, where the width of bars is 2s=bitLevelsOfPyramid[q−1]: it is a sum of 2s bars b[j] of the base histogram, excepting the last "wide" bar which can be a sum of less number of bars b[j]. The elements of bitLevelsOfPyramid array must be in 1..31 range and must be listed in increasing order: 0<bitLevelsOfPyramid[0]<...<bitLevelsOfPyramid[m−1]≤31.

      Supporting this pyramid little slows down include(int) and exclude(int) methods: they require O(m) operations to correct m arrays bq. But it can essentially optimize all methods which set the current value and rank: in the worst case they require

      O (bm.length + Σ q=1,2,...,m−12bitLevelsOfPyramid[q−1])

      and sequential settings of the current value and rank work much faster if the current value changes slightly.

      Without the pyramid of histograms (m=bitLevelsOfPyramid.length=0), the required time of setting the current value or rank is O(M) in the worst case, and sequential settings of the current value and rank also work much faster if the current value changes slightly. If the histogram length M is not large, for example, 256 or less, it is possible that this class will work faster without bitLevelsOfPyramid arguments.

      The passed bitLevelsOfPyramid argument is cloned by this method: no references to it are maintained by the created object.

      If you are sure that the sum of all bars b[k] (the total length of the supposed source array A) will never exceed Integer.MAX_VALUE, you can use newIntHistogram(int, int...) method instead of this one: the created object will probably work little faster and will occupy less memory.

      Parameters:
      histogramLength - the number M of bars of the new histogram.
      bitLevelsOfPyramid - the bit levels: binary logarithms of widths of bars in the sub-histograms in the "histogram pyramid"; can be empty, then will be ignored (the histogram pyramid will not be used).
      Returns:
      the new histogram with zero (empty) bars b[k]=0.
      Throws:
      NullPointerException - if bitLevelsOfPyramid argument is null.
      IllegalArgumentException - if histogramLength<0, or if bitLevelsOfPyramid.length>30, or if some of elements bitLevelsOfPyramid is not in 1..31 range, or if bitLevelsOfPyramid[k] >= bitLevelsOfPyramid[k+1] for some k.
    • newLongHistogram

      public static Histogram newLongHistogram(long[] histogram, int... bitLevelsOfPyramid)
      Creates new histogram, consisting of M=histogram.length bars, equal to elements of the given array. In other words, the bars b[k]=histogram[k] at first.

      The bitLevelsOfPyramid argument is used for optimization of large histograms, consisting of thousands or millions bars. Namely, this class automatically builds and supports a pyramid of histograms: m=bitLevelsOfPyramid.length additional arrays b1[k], b2[k], ..., bm[k], where

      bq[k] = Σ 2s*kj < min(2s*(k+1), M) b[j], s=bitLevelsOfPyramid[q−1];
      bq.length = ⌊M/2s⌋ = histogram.length >> bitLevelsOfPyramid[q−1].

      In other words, every "sub-histogram" bq consists of "wide" bars, where the width of bars is 2s=bitLevelsOfPyramid[q−1]: it is a sum of 2s bars b[j] of the base histogram, excepting the last "wide" bar which can be a sum of less number of bars b[j]. The elements of bitLevelsOfPyramid array must be in 1..31 range and must be listed in increasing order: 0<bitLevelsOfPyramid[0]<...<bitLevelsOfPyramid[m−1]≤31.

      Supporting this pyramid little slows down include(int) and exclude(int) methods: they require O(m) operations to correct m arrays bq. But it can essentially optimize all methods which set the current value and rank: in the worst case they require

      O (bm.length + Σ q=1,2,...,m−12bitLevelsOfPyramid[q−1])

      and sequential settings of the current value and rank work much faster if the current value changes slightly.

      Without the pyramid of histograms (m=bitLevelsOfPyramid.length=0), the required time of setting the current value or rank is O(M) in the worst case, and sequential settings of the current value and rank also work much faster if the current value changes slightly. If the histogram length M is not large, for example, 256 or less, it is possible that this class will work faster without bitLevelsOfPyramid arguments.

      The passed histogram and bitLevelsOfPyramid arguments are cloned by this method: no references to them are maintained by the created object.

      If you are sure that the sum of all bars b[k] (the total length of the supposed source array A) will never exceed Integer.MAX_VALUE, you can use newIntHistogram(int[], int...) method instead of this one: the created object will probably work little faster and will occupy less memory.

      Parameters:
      histogram - initial values of the bars b[k] of the histogram.
      bitLevelsOfPyramid - the bit levels: binary logarithms of widths of bars in the sub-histograms in the "histogram pyramid"; can be empty, then will be ignored (the histogram pyramid will not be used).
      Returns:
      the new histogram with bars b[k]=histogram[k].
      Throws:
      NullPointerException - if histogram or bitLevelsOfPyramid argument is null.
      IllegalArgumentException - if some of histogram elements are negative (<0), or if sum of all bars (elements of histogram array) is greater than Long.MAX_VALUE, or if bitLevelsOfPyramid.length>30, or if some of elements bitLevelsOfPyramid is not in 1..31 range, or if bitLevelsOfPyramid[k] >= bitLevelsOfPyramid[k+1] for some k.
    • newIntHistogram

      public static Histogram newIntHistogram(int histogramLength, int... bitLevelsOfPyramid)
      Creates new 32-bit histogram, consisting of M=histogramLength empty bars. In other words, all bars b[k]=0 at first; but they can be increased by include(int) method. "32-bit" means that the sum of all bars b[k] (the total length of the supposed source array A) will not be able to exceed Integer.MAX_VALUE. If you need to process greater numbers, please use newLongHistogram(int, int...) method instead of this one.

      The bitLevelsOfPyramid argument is used for optimization of large histograms, consisting of thousands or millions bars. Namely, this class automatically builds and supports a pyramid of histograms: m=bitLevelsOfPyramid.length additional arrays b1[k], b2[k], ..., bm[k], where

      bq[k] = Σ 2s*kj < min(2s*(k+1), M) b[j], s=bitLevelsOfPyramid[q−1];
      bq.length = ⌊M/2s⌋ = histogramLength >> bitLevelsOfPyramid[q−1].

      In other words, every "sub-histogram" bq consists of "wide" bars, where the width of bars is 2s=bitLevelsOfPyramid[q−1]: it is a sum of 2s bars b[j] of the base histogram, excepting the last "wide" bar which can be a sum of less number of bars b[j]. The elements of bitLevelsOfPyramid array must be in 1..31 range and must be listed in increasing order: 0<bitLevelsOfPyramid[0]<...<bitLevelsOfPyramid[m−1]≤31.

      Supporting this pyramid little slows down include(int) and exclude(int) methods: they require O(m) operations to correct m arrays bq. But it can essentially optimize all methods which set the current value and rank: in the worst case they require

      O (bm.length + Σ q=1,2,...,m−12bitLevelsOfPyramid[q−1])

      and sequential settings of the current value and rank work much faster if the current value changes slightly.

      Without the pyramid of histograms (m=bitLevelsOfPyramid.length=0), the required time of setting the current value or rank is O(M) in the worst case, and sequential settings of the current value and rank also work much faster if the current value changes slightly. If the histogram length M is not large, for example, 256 or less, it is possible that this class will work faster without bitLevelsOfPyramid arguments.

      The passed bitLevelsOfPyramid argument is cloned by this method: no references to it are maintained by the created object.

      Parameters:
      histogramLength - the number M of bars of the new histogram.
      bitLevelsOfPyramid - the bit levels: binary logarithms of widths of bars in the sub-histograms in the "histogram pyramid"; can be empty, then will be ignored (the histogram pyramid will not be used).
      Returns:
      the new histogram with zero (empty) bars b[k]=0.
      Throws:
      NullPointerException - if bitLevelsOfPyramid argument is null.
      IllegalArgumentException - if histogramLength<0, or if bitLevelsOfPyramid.length>30, or if some of elements bitLevelsOfPyramid is not in 1..31 range, or if bitLevelsOfPyramid[k] >= bitLevelsOfPyramid[k+1] for some k.
    • newIntHistogram

      public static Histogram newIntHistogram(int[] histogram, int... bitLevelsOfPyramid)
      Creates new 32-bit histogram, consisting of M=histogram.length bars, equal to elements of the given array. In other words, the bars b[k]=histogram[k] at first. "32-bit" means that the sum of all bars b[k] (the total length of the supposed source array A) cannot exceed and will not be able to exceed Integer.MAX_VALUE. If you need to process greater numbers, please use newLongHistogram(long[], int...) method instead of this one.

      The bitLevelsOfPyramid argument is used for optimization of large histograms, consisting of thousands or millions bars. Namely, this class automatically builds and supports a pyramid of histograms: m=bitLevelsOfPyramid.length additional arrays b1[k], b2[k], ..., bm[k], where

      bq[k] = Σ 2s*kj < min(2s*(k+1), M) b[j], s=bitLevelsOfPyramid[q−1];
      bq.length = ⌊M/2s⌋ = histogram.length >> bitLevelsOfPyramid[q−1].

      In other words, every "sub-histogram" bq consists of "wide" bars, where the width of bars is 2s=bitLevelsOfPyramid[q−1]: it is a sum of 2s bars b[j] of the base histogram, excepting the last "wide" bar which can be a sum of less number of bars b[j]. The elements of bitLevelsOfPyramid array must be in 1..31 range and must be listed in increasing order: 0<bitLevelsOfPyramid[0]<...<bitLevelsOfPyramid[m−1]≤31.

      Supporting this pyramid little slows down include(int) and exclude(int) methods: they require O(m) operations to correct m arrays bq. But it can essentially optimize all methods which set the current value and rank: in the worst case they require

      O (bm.length + Σ q=1,2,...,m−12bitLevelsOfPyramid[q−1])

      and sequential settings of the current value and rank work much faster if the current value changes slightly.

      Without the pyramid of histograms (m=bitLevelsOfPyramid.length=0), the required time of setting the current value or rank is O(M) in the worst case, and sequential settings of the current value and rank also work much faster if the current value changes slightly. If the histogram length M is not large, for example, 256 or less, it is possible that this class will work faster without bitLevelsOfPyramid arguments.

      The passed histogram and bitLevelsOfPyramid arguments are cloned by this method: no references to them are maintained by the created object.

      Parameters:
      histogram - initial values of the bars b[k] of the histogram.
      bitLevelsOfPyramid - the bit levels: binary logarithms of widths of bars in the sub-histograms in the "histogram pyramid"; can be empty, then will be ignored (the histogram pyramid will not be used).
      Returns:
      the new histogram with bars b[k]=histogram[k].
      Throws:
      NullPointerException - if histogram or bitLevelsOfPyramid argument is null.
      IllegalArgumentException - if some of histogram elements are negative (<0), or if sum of all bars (elements of histogram array) is greater than Integer.MAX_VALUE, or if bitLevelsOfPyramid.length>30, or if some of elements bitLevelsOfPyramid is not in 1..31 range, or if bitLevelsOfPyramid[k] >= bitLevelsOfPyramid[k+1] for some k.
    • length

      public final int length()
      Returns the number M of bars of the histogram.

      The result of this method is always non-negative (≥0).

      Returns:
      the number M of bars of the histogram.
    • total

      public abstract long total()
      Returns the sum N of all bars b[k] of the histogram. In other words, it is the current length of the supposed source array A, on the base of which this histogram is built.

      The result of this method is always non-negative (≥0).

      Returns:
      the sum N of all bars b[k] of the histogram.
    • bar

      public abstract long bar(int value)
      Returns the bar #value of the histogram: b[value]. If the index value is negative or ≥M=length(), this method returns 0 and does not throw an exception (unlike include and exclude methods).

      The result of this method is always non-negative (≥0).

      Parameters:
      value - the index of the bar; can be out of 0..M range.
      Returns:
      the bar #value of the histogram.
    • bars

      public abstract long[] bars()
      Returns all bars of the histogram. The length of the returned array is M=length(), and the element #k is equal to b[k].

      The returned array is never a reference to an internal array stored in this object: if necessary, the internal Java array is cloned.

      Returns:
      all bars of the histogram as long[] Java array.
    • include

      public abstract void include(int value)
      Increments the bar #value of the histogram by 1: b[value]++. It can be interpreted as adding one element value to the source array A, on the base of which this histogram is built.

      If this histogram is 32-bit, that is created by newIntHistogram(int, int...) or newIntHistogram(int[], int...) method, this method throws IllegalStateException if the current sum of all bars total() is Integer.MAX_VALUE (231−1). In other cases, this method throws IllegalStateException if the current sum of all bars is Long.MAX_VALUE (263−1). In a case of throwing an exception, this method does not change the histogram.

      This method does not change the current value, but can change the current simple and precise ranks. If there are some sharing instances, this method change ranks in all sharing instances.

      Parameters:
      value - the index of the increased histogram bar.
      Throws:
      IndexOutOfBoundsException - if value<0 or value>=M=length().
      IllegalStateException - if total()==Long.MAX_VALUE (or Integer.MAX_VALUE for 32-bit histogram).
      See Also:
    • exclude

      public abstract void exclude(int value)
      Decrements the bar #value of the histogram by 1: b[value]--. It can be interpreted as removing one element, equal to value, from the source array A, on the base of which this histogram is built.

      If the bar #value of the histogram is zero (b[value]=0), this method throws IllegalStateException and does not change the histogram. So, the bars of the histogram cannot become negative.

      This method does not change the current value, but can change the current simple and precise ranks. If there are some sharing instances, this method change ranks in all sharing instances.

      Parameters:
      value - the index of the increased histogram bar.
      Throws:
      IndexOutOfBoundsException - if value<0 or value>=M=length().
      IllegalStateException - if b[value]=0.
      See Also:
    • include

      public abstract void include(int... values)
      Equivalent to a simple loop of calls of include(int) method for all passed values.

      The only possible difference from this loop is that the exception, if it occur, can be thrown at the very beginning (but can also be thrown only while increasing the corresponding bar). As well as the simple loop of include(int), this method can be non-atomic regarding this failure: it can increase some histogram bars and then throw an exception.

      Parameters:
      values - the indexes of the increased histogram bars. If some index is repeated several times in this array, the corresponding histogram bar will be increased several times.
      Throws:
      NullPointerException - if values array is null.
      IndexOutOfBoundsException - if some values[k]<0 or values[k]>=M=length().
      IllegalStateException - if total()>Long.MAX_VALUE-values.length (or Integer.MAX_VALUE-values.length for 32-bit histogram).
      See Also:
    • exclude

      public abstract void exclude(int... values)
      Equivalent to a simple loop of calls of exclude(int) method for all passed values.

      The only possible difference from this loop is that the exception, if it occur, can be thrown at the very beginning (but can also be thrown only while decreasing the corresponding bar). As well as the simple loop of exclude(int), this method can be non-atomic regarding this failure: it can decrease some histogram bars and then throw an exception.

      Parameters:
      values - the indexes of the decreased histogram bars. If some index is repeated several times in this array, the corresponding histogram bar will be decreased several times.
      Throws:
      NullPointerException - if values array is null.
      IndexOutOfBoundsException - if some values[k]<0 or values[k]>=M=length().
      IllegalStateException - if b[values[k]]=0 for some k.
      See Also:
    • currentIRank

      public abstract long currentIRank()
      Returns the rank r(w) of the value w=currentIValue(). (This rank is the same in both histogram models.) In other words, it is the sum of all histogram bars b[k] from the left of the bar #w: Σk<w b[k]. See comments to Histogram class for more details.
      Returns:
      the rank of the integer part of the current value.
      See Also:
    • currentRank

      public final double currentRank()
      Returns the current simple rank: rS=r(v) in terms of the simple histogram model. See comments to Histogram class for more details.
      Returns:
      the current simple rank.
      See Also:
    • currentPreciseRank

      public final double currentPreciseRank()
      Returns the current precise rank: rP=r(v) in terms of the precise histogram model. See comments to Histogram class for more details.
      Returns:
      the current precise rank.
      See Also:
    • currentIValue

      public final int currentIValue()
      Returns the current value v, rounded to an integer number.

      The rules of rounding depend on the order of previous calls of 5 methods, changing the current value and ranks: moveToValue(double), moveToRank(double), moveToPreciseRank(double), moveToIValue(int), moveToIRank(long). (Calls of any other methods, in particular, include and exclude, do not affect to the result of this method.)

      If the last from those methods was moveToValue, moveToRank, moveToIValue or moveToIRank, then the result of this method is just the integer part of the current value: ⌊v⌋. But after moveToPreciseRank method the returned value will be equal to iPreciseValue(histogram, rank), where histogram is this histogram (the result of bars() method) and rank is the argument of moveToPreciseRank(double).

      The special case N=total()=0 is an exception from the last rule: in this case, all 3 methods moveToPreciseRank, moveToRank and moveToIRank are equivalent, but the concrete results of currentValue() method (i.e. v) and this method are not documented — unlike the result of iPreciseValue static function, which returns 0 in this case. However, in this case, as after any call of moveToRank / moveToIRank, there is a guarantee that the result of this method is equal to the integer part of the current value ⌊v⌋.

      Immediately after creating a new histogram this method always returns 0 (like currentValue()).

      This result of this method always lies between (int)v=⌊v⌋ and Math.round(v).

      Returns:
      the current value (percentile), rounded to an integer number.
      See Also:
    • currentValue

      public final double currentValue()
      Returns the current value v. It is equal to the percentile of this histogram, found either in terms of the simple histogram model, if before this we called moveToRank(double) method, or in terms of the precise histogram model, if before this we called moveToPreciseRank(double) method. See comments to Histogram class for more details.
      Returns:
      the current value (percentile).
      See Also:
    • outsideNonZeroPart

      public final boolean outsideNonZeroPart()
      Returns true if and only if the current bar is zero: b[⌊v⌋]=0 (v is the current value) and either all bars from the left, or all bars from the right are also zero. Equivalent to leftFromNonZeroPart() || rightFromNonZeroPart().
      Returns:
      true if the current bar (containing the current value) and all bars rightward or leftward from it are zero.
    • leftFromNonZeroPart

      public final boolean leftFromNonZeroPart()
      Returns true if and only if the current bar is zero: b[⌊v⌋]=0 (v is the current value) and all bars from the left are also zero: b[k]=0 for all k<⌊v⌋.
      Returns:
      true if the current bar (containing the current value) and all bars leftward from it are zero.
    • leftFromOrAtBoundOfNonZeroPart

      public final boolean leftFromOrAtBoundOfNonZeroPart()
      Returns true if and only all bars from the left of the current bar are zero: b[k]=0 for all k<⌊v⌋ (v is the current value).
      Returns:
      true if all bars leftward from the current value are zero.
    • rightFromNonZeroPart

      public final boolean rightFromNonZeroPart()
      Returns true if and only if the current bar is zero: b[⌊v⌋]=0 (v is the current value) and all bars from the right are also zero: b[k]=0 for all k>⌊v⌋.
      Returns:
      true if the current bar (containing the current value) and all bars rightward from it are zero.
    • rightFromOrAtBoundOfNonZeroPart

      public final boolean rightFromOrAtBoundOfNonZeroPart()
      Returns true if and only all bars from the right of the current bar are zero: b[k]=0 for all k>⌊v⌋ (v is the current value).
      Returns:
      true if all bars rightward from the current value are zero.
    • moveToIRank

      public abstract Histogram moveToIRank(long rank)
      Sets the current simple rank rS and precise rank rP to be equal of the rank argument. (Because the argument is integer, both rS and rP ranks are the same.) If the rank argument is negative, it is replaced with 0 (minimal possible rank); if rank>N=total(), it is replaced with N (maximal possible rank). The current value v automatically changes in accordance to the new rank. See comments to Histogram class for more details.

      In the special case N=0 (all bars of the histograms are zero), both simple and precise ranks are always zero, not depending on calls of this method: rS=rP=0 (because r(v) is a zero constant by definition). But v(r) function (unlike r(v)) is not defined in this case, so, if N=0, the current value v after calling this method is not documented — there is the only guarantee that 0≤vM.

      This method works little faster than equivalent calls moveToRank(rank) and moveToPreciseRank(rank).

      Parameters:
      rank - new rank rS=rP.
      Returns:
      the reference to this object.
      See Also:
    • moveToRank

      public abstract Histogram moveToRank(double rank)
      Sets the current simple rank rS to be equal of the rank argument. If the rank argument is negative, it is replaced with 0 (minimal possible rank); if rank>N=total(), it is replaced with N (maximal possible rank). The current precise rank rP and the current value v automatically change in accordance to the new simple rank. See comments to Histogram class for more details.

      In the special case N=0 (all bars of the histograms are zero), both simple and precise ranks are always zero, not depending on calls of this method: rS=rP=0 (because r(v) is a zero constant by definition). But v(r) function (unlike r(v)) is not defined in this case, so, if N=0, the current value v after calling this method is not documented — there is the only guarantee that 0≤vM.

      Parameters:
      rank - new simple rank rS.
      Returns:
      the reference to this object.
      Throws:
      IllegalArgumentException - if Double.isNaN(rank).
      See Also:
    • moveToPreciseRank

      public Histogram moveToPreciseRank(double rank)
      Sets the current precise rank rP to be equal of the rank argument. If the rank argument is negative, it is replaced with 0 (minimal possible rank); if rank>N=total(), it is replaced with N (maximal possible rank). The current simple rank rS and the current value v automatically change in accordance to the new precise rank. See comments to Histogram class for more details.

      In the special case N=0 (all bars of the histograms are zero), both simple and precise ranks are always zero, not depending on calls of this method: rS=rP=0 (because r(v) is a zero constant by definition). But v(r) function (unlike r(v)) is not defined in this case, so, if N=0, the current value v after calling this method is not documented — there is the only guarantee that 0≤vM.

      Parameters:
      rank - new precise rank rP.
      Returns:
      the reference to this object.
      Throws:
      IllegalArgumentException - if Double.isNaN(rank).
      See Also:
    • moveToIValue

      public abstract Histogram moveToIValue(int value)
      Sets the current value v to be equal of the value argument. If the value argument is negative, it is replaced with 0 (minimal possible value); if value>M=length(), it is replaced with M (maximal possible value). The current simple rank rS and the current precise rank rP automatically change in accordance to the new value. See comments to Histogram class for more details.

      This methods works little faster than the equivalent call moveToValue(value).

      Parameters:
      value - new current value (percentile).
      Returns:
      the reference to this object.
      See Also:
    • moveToValue

      public Histogram moveToValue(double value)
      Sets the current value v to be equal of the value argument. If the value argument is negative, it is replaced with 0 (minimal possible value); if value>M=length(), it is replaced with M (maximal possible value). The current simple rank rS and the current precise rank rP automatically change in accordance to the new value. See comments to Histogram class for more details.
      Parameters:
      value - new current value (percentile).
      Returns:
      the reference to this object.
      Throws:
      IllegalArgumentException - if Double.isNaN(value).
      See Also:
    • shareCount

      public abstract long shareCount()
      Returns the number of instances of this class, sharing the histogram array b[k] with this instance. In other words, it returns the length of the circular list returned by nextSharing() method. Returns 1 if share() method was not used.

      It is obvious that this method always returns the same value for all instances sharing the same histogram array.

      Returns:
      the number of instances of this class, sharing the histogram array b[k] with this one.
    • nextSharing

      public abstract Histogram nextSharing()
      Returns the next instance of this class, sharing the histogram array b[k] with this instance.

      All instances, created by share() method, are connected into a circular list, and this method returns the next element in this list. For example, if the instance h1 was created by newLongHistogram(int, int...) method and, after this, the instance h2 was created as h2=h1.share(), then this method in h1 object returns h2 and in h2 object returns h1. If there are no sharing instances, this method returns the reference to this instance.

      You can get all instances, sharing the same array b[k] with the given histogram hist, by the following loop:

       Histogram h = hist.nextSharing();
       do {
           // some processing h instance
           h = h.nextSharing();
       } while (h != hist);
       

      See comments to Histogram class for more details.

      Returns:
      the next instance sharing the histogram array b[k] with this instance, or the reference to this instance if you did not use share() method.
      See Also:
    • share

      public abstract Histogram share()
      Creates new instance of this class, which uses the same arrays of bars b[k]. The returned instance will share the histogram b[k] with this instance: any modification of b array by include / exclude methods will be immediately reflected in all shared instances. However, the created sharing instance has an independent set of current value, current simple rank and current precise rank, and they are initially set to 0.

      The returned instance has the absolutely same behaviour as this one: it uses the same set of bit levels (see comments to newLongHistogram(int, int...) method), it is 32-bit if and only if this instance is 32-bit, it is a SummingHistogram if and only if this instance is a SummingHistogram, etc.

      See comments to Histogram class for more details.

      Returns:
      newly created instance sharing the histogram array b[k] with this instance.
      See Also:
    • iValue

      public static int iValue(long[] histogram, long rank)
      Returns the element with the given index rank in the sorted array of integer numbers 0..histogram.length-1, corresponding to this histogram.

      More precisely, returns minimal integer value v so that b[0]+b[1]+...+b[v]>rank, b[k]=histogram[k]. If rank≤0, this method returns min (kZ: b[k]>0) (the minimal element in the source array). If rank≥(sum of all b[k]), it returns max (kZ: b[k]>0)+1 (the maximal element plus 1). If all columns b[k] are zero (no elements), this method returns 0.

      Parameters:
      histogram - histogram[k] is the number of elements in some source array that are equal to k. All histogram[k] must be non-negative; in other case, IllegalArgumentException can be thrown (but also can be not thrown).
      rank - the index in the source array.
      Returns:
      value of the found element (percentile).
      Throws:
      NullPointerException - if histogram argument is null.
      See Also:
    • iValue

      public static int iValue(int[] histogram, long rank)
      Precise equivalent of iValue(long[], long) for a case of int[] type of the histogram.
      Parameters:
      histogram - histogram[k] is the number of elements in the source array that are equal to k. All histogram[k] must be non-negative; in other case, IllegalArgumentException can be thrown (but also can be not thrown).
      rank - the index in the source array.
      Returns:
      value of the found element (percentile).
      Throws:
      NullPointerException - if histogram argument is null.
      See Also:
    • value

      public static double value(long[] histogram, double rank)
      Floating-point version of iValue(long[], long). Alike preciseValue(long[], double), this function supposes that the histogram is built on an array of floating-point values after truncating them to an integer value (long)value, but it doesn't try to interpolate value between different bars of the histogram.

      More precisely, we suppose that if b[k]==b (here and below b[k]=histogram[k]), it means that the source floating-point array contains b values k+j/b, j=0,1,...,b−1. With this suggestion, this method finds the element of the source array v1 with the index #r1=⌊rank⌋=(long)rank. Obviously, v1 = v0+(r1-r0)/b[v0], where v0 is the minimal integer value so that b[0]+b[1]+...+b[v0]>r1 and r0=b[0]+b[1]+...+b[v0−1]. Then this method returns v1+(rankr1)/b[v0] (this value is equal to v0+(rankr0)/b[v0]). Please compare: unlike preciseValue(long[], double), we do not find the next element v2 in the following bars of the histogram, but just interpolate between v1 and v2=v1+1/b[v0].

      As iValue(long[], long), if rank<0, this method returns min (kZ: b[k]>0) (the minimal element in the source array), and if rank≥(sum of all b[k]), it returns max (kZ: b[k]>0)+1 (for floating-point array, it means the maximal element plus 1/b[k]). If all columns b[k] are zero (no elements), this method returns 0.

      The result of this method is equal to the percentile v(r) for the passed r=rank in terms of the simple histogram model: see comments to Histogram class for more details.

      Parameters:
      histogram - histogram[k] is the number of elements in the source floating-point array that are "almost equal" to k. All histogram[k] must be non-negative; in other case, IllegalArgumentException can be thrown (but also can be not thrown).
      rank - the index in the source array (if non-integer, this method returns a real value, which is little greater than the element #r1=(long)rank, but is less than the next element #r1+1 and has the same integer part).
      Returns:
      the found value (percentile).
      Throws:
      NullPointerException - if histogram argument is null.
      IllegalArgumentException - if Double.isNaN(rank).
      See Also:
    • value

      public static double value(int[] histogram, double rank)
      Precise equivalent of value(long[], double) for a case of int[] type of the histogram.
      Parameters:
      histogram - histogram[k] is the number of elements in the source floating-point array that are "almost equal" to k. All histogram[k] must be non-negative; in other case, IllegalArgumentException can be thrown (but also can be not thrown).
      rank - the index in the source array (if non-integer, this method returns a real value, which is little greater than the element #r1=(long)rank, but is less than the next element #r1+1 and has the same integer part).
      Returns:
      the found value (percentile).
      Throws:
      NullPointerException - if histogram argument is null.
      IllegalArgumentException - if Double.isNaN(rank).
      See Also:
    • iPreciseValue

      public static int iPreciseValue(long[] histogram, double rank)
      "Interpolated" version of iValue(long[], long), rounded to the "best" integer result. Alike preciseValue(long[], double), this function supposes that the histogram is built on an array of floating-point values after truncating them to an integer value (long)value. In addition to preciseValue, this function tries to approximate the real result by some nearest integer value.

      More precisely, we suppose that if b[k]==b (here and below b[k]=histogram[k]), it means that the source floating-point array contains b values k+j/b, j=0,1,...,b−1. With this suggestion, this method finds the element of the source array v1 with the index #r1=⌊rank⌋=(long)rank and the element of the source array v2 with the index #r2=r1+1. Here v1 = v0+(r1-r0)/b[v0], where v0 is the minimal integer value so that b[0]+b[1]+...+b[v0]>r1 and r0=b[0]+b[1]+...+b[v0−1], and there is the analogous formula for v2. If rank argument is integer (rank==(long)rank), this method does not try to find v2 and just returns v1. Until this moment, this method works like preciseValue.

      After this, the behaviour of this method is more complicated. If rank is not integer, we calculate v1'=v1+1/b[v1] and v2'=v2+1/b[v2]. Let's consider that the true real values in the source array #r1 (w1) and #r2=r1+1 (w2) are unknown, but lie in ranges v1w1<v1' and v2w2<v2'. (We really don't know the precise real values, we only know that some b[⌊v1⌋] values lie in v1⌋..⌊v1⌋+1 range and some b[⌊v2⌋] values lie in v2⌋..⌊v2⌋+1 range.) Then the value with real "index" rank, interpolated between w1 and w2, lies in range aw<b, where a=v1 + (rankr1) * (v2v1) (the result of preciseValue call with the same arguments) and b=v1' + (rankr1) * (v2'−v1').

      This method finds the integer range v..v+1, which "covers" the range a..b in the best way. Namely, it calculates v=⌊(a+b)/2⌋ and returns v as the result.

      The result of this method always lies between (int)p and Math.round(p), where p=preciseValue(histogram,rank).

      As iValue(long[], long), if rank<0, this method returns min (kZ: b[k]>0) (the minimal element in the source array), and if rank≥(sum of all b[k]), it returns max (kZ: b[k]>0)+1 (for floating-point array, it means the maximal element plus 1/b[k]). If rank>(sum of all b[k])−1, but rank<(sum of all b[k]), then in formulas above there is no element v2 with the index #r2=r1+1; in this case, this method returns max (kZ: b[k]>0). If all columns b[k] are zero (no elements), this method returns 0.

      Parameters:
      histogram - histogram[k] is the number of elements in the source floating-point array that are "almost equal" to k. All histogram[k] must be non-negative; in other case, IllegalArgumentException can be thrown (but also can be not thrown).
      rank - the index in the source array (if non-integer, this method interpolates nearest elements).
      Returns:
      interpolated value of the found element (percentile), rounded to the "best" integer value.
      Throws:
      NullPointerException - if histogram argument is null.
      IllegalArgumentException - if Double.isNaN(rank).
      See Also:
    • iPreciseValue

      public static int iPreciseValue(int[] histogram, double rank)
      Precise equivalent of iPreciseValue(long[], double) for a case of int[] type of the histogram.
      Parameters:
      histogram - histogram[k] is the number of elements in the source floating-point array that are "almost equal" to k. All histogram[k] must be non-negative; in other case, IllegalArgumentException can be thrown (but also can be not thrown).
      rank - the index in the source array (if non-integer, this method interpolates nearest elements).
      Returns:
      interpolated value of the found element (percentile), rounded to the "best" integer value.
      Throws:
      NullPointerException - if histogram argument is null.
      IllegalArgumentException - if Double.isNaN(rank).
      See Also:
    • preciseValue

      public static double preciseValue(long[] histogram, double rank)
      "Interpolated" version of iValue(long[], long). This function supposes that the histogram is built on an array of floating-point values after truncating them to an integer value (long)value.

      More precisely, we suppose that if b[k]==b (here and below b[k]=histogram[k]), it means that the source floating-point array contains b values k+j/b, j=0,1,...,b−1. With this suggestion, this method finds the element of the source array v1 with the index #r1=⌊rank⌋=(long)rank and the element of the source array v2 with the index #r2=r1+1. Obviously, v1 = v0+(r1-r0)/b[v0], where v0 is the minimal integer value so that b[0]+b[1]+...+b[v0]>r1 and r0=b[0]+b[1]+...+b[v0−1], and there is the analogous formula for v2. Note: v2=v1+1/b[v0] if r2<r0+b[v0] or if r2=r0+b[v0] and the next bar is non-zero: b[v0+1]>0; in other case, v2 is the minimal integer >v0 so that b[v2]>0.

      After finding v1 and v2, this method returns the value interpolated between them: v1 + (rankr1) * (v2v1). Note: if rank argument is integer (rank==(long)rank), this method does not try to find v2 and just returns v1.

      As iValue(long[], long), if rank<0, this method returns min (kZ: b[k]>0) (the minimal element in the source array), and if rank≥(sum of all b[k]), it returns max (kZ: b[k]>0)+1 (for floating-point array, it means the maximal element plus 1/b[k]). If rank>(sum of all b[k])−1, but rank<(sum of all b[k]), then in formulas above there is no element v2 with the index #r2=r1+1; in this case, it is supposed v2=v1+1 (the maximal element of the floating-point array plus 1/b[k], k=v1). If all columns b[k] are zero (no elements), this method returns 0.

      Please compare the described behaviour with little more simple behaviour of value(long[], double) method.

      The result of this method is equal to the percentile v(r) for the passed r=rank in terms of the precise histogram model: see comments to Histogram class for more details.

      Parameters:
      histogram - histogram[k] is the number of elements in the source array that are "almost equal" to k. All histogram[k] must be non-negative; in other case, IllegalArgumentException can be thrown (but also can be not thrown).
      rank - the index in the source array (if non-integer, this method interpolates nearest elements).
      Returns:
      interpolated value of the found element (percentile).
      Throws:
      NullPointerException - if histogram argument is null.
      IllegalArgumentException - if Double.isNaN(rank).
      See Also:
    • preciseValue

      public static double preciseValue(int[] histogram, double rank)
      Precise equivalent of preciseValue(long[], double) for a case of int[] type of the histogram.
      Parameters:
      histogram - histogram[k] is the number of elements in the source array that are "almost equal" to k. All histogram[k] must be non-negative; in other case, IllegalArgumentException can be thrown (but also can be not thrown).
      rank - the index in the source array (if non-integer, this method interpolates nearest elements).
      Returns:
      interpolated value of the found element (percentile).
      Throws:
      NullPointerException - if histogram argument is null.
      IllegalArgumentException - if Double.isNaN(rank).
      See Also:
    • percentile

      public static double percentile(long[] histogram, long sumOfColumns, double percentileLevel)
      Equivalent to preciseValue(histogram,percentileLevel*(sumOfColumns-1)), if sumOfColumns>0. If sumOfColumns==0, this method immediately returns 0.0; if sumOfColumns<0, this method throws an exception.

      If is supposed that sumOfColumns is the sum of all histogram elements. If it is true, the returned value is usually called the percentile of the source array with the level specified by percentileLevel argument. If percentileLevel==0.5, this value is also called the median of the source array. But, of course, you can pass any positive value as sumOfColumns: in any case, if sumOfColumns>0, this method returns the result of preciseValue(histogram,percentileLevel*(sumOfColumns-1)).

      Parameters:
      histogram - histogram[k] is the number of elements in the source array that are "almost equal" to k. All histogram[k] must be non-negative; in other case, IllegalArgumentException can be thrown (but also can be not thrown).
      sumOfColumns - should be equal to sum of all histogram elements (in other words, the length of the source array).
      percentileLevel - the percentile level (usually from 0.0 to 1.0).
      Returns:
      interpolated value of the found element (percentile).
      Throws:
      NullPointerException - if histogram argument is null.
      IllegalArgumentException - if Double.isNaN(percentileLevel) or if sumOfColumns<0.
      See Also:
    • percentile

      public static double percentile(int[] histogram, long sumOfColumns, double percentileLevel)
      Equivalent to preciseValue(histogram,percentileLevel*(sumOfColumns-1)), if sumOfColumns>0. If sumOfColumns==0, this method immediately returns 0.0; if sumOfColumns<0, this method throws an exception.

      If is supposed that sumOfColumns is the sum of all histogram elements. If it is true, the returned value is usually called the percentile of the source array with the level specified by percentileLevel argument. If percentileLevel==0.5, this value is also called the median of the source array. But, of course, you can pass any positive value as sumOfColumns: in any case, if sumOfColumns>0, this method returns the result of preciseValue(histogram,percentileLevel*(sumOfColumns-1)).

      Parameters:
      histogram - histogram[k] is the number of elements in the source array that are "almost equal" to k. All histogram[k] must be non-negative; in other case, IllegalArgumentException can be thrown (but also can be not thrown).
      sumOfColumns - should be equal to sum of all histogram elements (in other words, the length of the source array).
      percentileLevel - the percentile level (usually from 0.0 to 1.0).
      Returns:
      interpolated value of the found element (percentile).
      Throws:
      NullPointerException - if histogram argument is null.
      IllegalArgumentException - if Double.isNaN(percentileLevel) or if sumOfColumns<0.
      See Also:
    • sumOf

      public static long sumOf(long[] histogram)
      Returns the sum of all elements of the passed histogram array.
      Parameters:
      histogram - any array (for example, a histogram).
      Returns:
      the sum of all elements of the passed array.
      Throws:
      NullPointerException - if histogram argument is null.
    • sumOf

      public static int sumOf(int[] histogram)
      Returns the sum of all elements of the passed histogram array.
      Parameters:
      histogram - any array (for example, a histogram).
      Returns:
      the sum of all elements of the passed array.
      Throws:
      NullPointerException - if histogram argument is null.
    • clear

      public static void clear(int[] histogram, int sumOfColumns)
      Fills all non-zero elements of histogram by 0.

      Works faster then trivial loop for all elements histogram[0..histogram.length-1], if most of last elements of histogram array already contain zero.

      Parameters:
      histogram - cleared histogram.
      sumOfColumns - should be equal to sum of all histogram elements.