On fractal distribution function estimation and applications

5 Monte Carlo analysis

Before going into details with simulations results, we want to remark that the IFS estimators are fractal objects, this means that they are nowhere differentiable and they are self-similar. In Figure 2 we have represented the distribution function estimator

of an underlying

Figure 1: Old Faithful gayser data rescaled on [0,1]. Dotted line is the kernel estimator (bw=0.03, kernel=Gaussian), solid line is the IFS-Fourier expansion estimator (iterated 2 times, 26 Fourier coefficients).

truncated normal distribution. It is evident that the curve is simply replicates of itself. To put in evidence this fractal nature, we have "zoomed" the curve 4 times. As it is possible to see the curve is the same at any scale. Figure 1 shows an application of the density estimator to real data. In particular, we have choosen the classical textbook Old Faithful gayser data rescaled on [0,1]. It is evident that

is capable of discriminate the two curves as the kernel estimator does.

As seen in the previous sections, it is rather difficult to establish statistical properties of the estimators based on the IFS for small sample sizes as it is not yet clear to us, how to characterize the fixed points of the IFSs. So in this section we will show some numerical results both for distribution function and density estimation. We have choosen the Beta family of random variable as they allow compact support, moments existance, di erent shapes and well tested pseudo random number generators. As criterion for evaluating the performance of the estimators we consider the average mean square error (AMSE) and the sup-norm distance. We also consider small sample sizes n = 10, 20, 30, 50, 75, 100 as asymptotically the IFS converges to the e.d.f. based estimators. Four estimators are considered for the distribution function:

with

For

we have choose n/2 quantiles. For the density estimator, we compare a standard kernel estimator and

the Fourier transform estimators based on the IFS. It is well known that kernel estimators are particular

Figure 2: The fractal nature of the IFS distribution function estimator

The dotted line is the underlying truncated Gaussian distribution. The dotted rectangle is to represent the area zoomed in the next plot (left to right, up to down). The dotted boxes are in the order:

Fourier expansion estimators by a proper choice of the multiplier c_k when the e.d.f is used in the Fourier transform. The number of terms used in the Fourier series estimators of the distribution function, is choosen accordingly to the following rule

then use the first m coefficentsas suggested in Tarter and Lock (1993). The rule of thumb we use cannot be considered optimal in any sense but its principle is to minimize the integrated MSE. The software used is R (Ihaka and Gentleman, 1996) with a beta 'ifs' package available on CRAN http://cran.r-project.org/, in the contributed packages directory. Kernel density estimation is as in Silverman (1986) and implemented in R with the density() function (see also Venables and Ripley, 2002) in the R implementation. All the estimates are evaluated in 512 points in order to calculates AMSE and the sup-norm. For density estimation we calculate the average of the absolute error (MAE) instead of the sup-norm as this index is influenced by the bad performance of density estimators in the endpoints (0 and 1) of the support of the distributions.

Tables 2 and 3 are organized as follows: there are five main columns, one for the distribution investigated, two for the distribution function estimators and the last two are for density estimation. Under column AMSE, the W1 column reports the ratio, in percentage, between the AMSE of W1 and the AMSE of the ^ Fn and similarly for the entire row. This means that we indicate the relative efficiency of the three estimators W1 , ^ TN anf ^ FFT with respect to ^ Fn. Under the column marked SUP-NORM the same scheme has been applied but considering the sup-norm distance.

column reports the ratio, in percentage, between the AMSE of

and the AMSE of the

and similarly for the entire row. This means that we indicate the relative efficiency of the three estimators

and

with respect to

Under the column marked SUP-NORM the same scheme has been applied but considering the sup-norm distance.

The last two columns are for density estimation. This time the columns represents the ratio in percentage, of the distance for the Fourier estimator

and the kernel estimator. The tables shows that, in the average the

estimator is equivalent to to the e.d.f. for ugly distributions like the beta(.1,.9) or beta(.1,.1), while it is somewhat better in the other cases (10 to 20% better). The Fourier series estimator based onf IFS,

is preferable to the e.d.f only for bell shaped distributions and seems unbeatable for simmetric shaped laws. This is somewhat expected by a Fourier expansion estimator. The same argument applies to the density estimator: for bell shaped symmetric distributions, it seems as good as the kernel estimator and in some cases even better. For the beta(.1,.9) or the beta(.1,.1) the density estimators (both kernel and our Fourier) are of no use, we have omitted the corresponding ratios. For the beta(.1,.9) or the beta(.1,.1) the density estimators (both kernel and our Fourier) are of no use, we have omitted the corresponding ratios.

Stefano M. Iacus, Davide La Torre

Next: 5.1 Applications to survival analysis

Summary: Index