Mythbusting ECW decompression

Anyone who frequents the ESRI ArcGIS Desktop v10 publishing wizard would be familiar with the screenshot below. But has anyone stepped back and thought, hang on why is wavelet compression bad? For many readers, you may recall we have had an ESRI ArcPAD ECW plugin since 2003. Way back then this was powered by tiny 300Mhz mobile CPU’s and draw performance for ECW’s were instant. What has changed with wavelet technology that has meant in 2012 its something to be fearful of in server use? Have we really gone backwards?

Full disclosure: As of writing, I’m currently Product Manager for APOLLO IWS

As per, http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00sq0000000t000000.htm

*ahem, in my best robot voice ..*

You have a raster map layer that uses wavelet image compression (for example, MrSID, JPEG 2000, ECW), which can impede map drawing performance.

What should I do Raster robot!?

..convert your wavelet-compressed raster dataset to a more efficient display format

Why?

… because the data does not have to be uncompressed at display time.

Ok, semi-plausible by the sounds. Is there anything else published to qualify these claims I wonder?

http://proceedings.esri.com/library/userconf/pug11/papers/arcgis_server-performance_and_scalability-optimization_and_testing.pdf

“Avoid wavelet compression-based raster types (MrSID,JPEG2000)” (Page6)

“Tiled, JPEG compressed TIFF is the best (10-400% faster)  (page 11)

-          Andrew Sakowicz, ESRI Professional Services Redlands, April 2011

Wow. This seems to be scattered everywhere; it must be true.

Let’s verify what another Image Server has to say just in case. We can’t go past the FOSS4G Geoserver on steroids paper and I’m not quite sure what happened to the text ..

So armed with an ECW file, I know it meets the above objectives. I wonder what Geoserver states about performance  ..

PROPRIETARY!? Oh no; but hang on what is the performance? More digging at http://opengeo.org/publications/geoserver-production/ gives a small nugget, but alas no details

In addition to adding overviews, using raster formats based on wavelet transforms such as ECW and MrSID will also improve performance

Further digging into the FOSS4G ’09 Benchmarking raster results with the same test data I’m about to use show Geoserver ECW (260mb) peaking at 11.2 maps/sec vs uncompressed tiled TIF raster (~16,000mb) at 13.7 maps/sec. Admittedly these results are out of date, nevertheless the performance gain of 20% came at an enormous storage cost. Image quality unfortunately wasn’t analyzed nor was JPEG compression. So there does seem to be at least a little performance penalty, in the old Geoserver version anyway and only comparing against uncompressed tiled tif. So not really a useful reference in this context.

And now the kicker that I bet half of you are salivating over. ECW SDK EULA requires a paid license from Intergraph | ERDAS for use in a server environment. Money!? Absurd.

So how about we actually verify these claims. Based on the variety of literature found in a whole ten minutes, there are countless people saying that JPEG compressed TIF should at least match ECW performance. ESRI tells me that wavelet compression is bad, takes longer to decompress and is proprietary. JPEG TIFF is 30% bigger but look the same and are as fast as ECW, apparently. (ref)

But if all of that is true why on Earth do Intergraph | ERDAS think they can charge for it and why is it nigh impossible to validate all these claims? Lets find out through a simple example ..

Take my favourite small sample image,

Driver: GTiff/GeoTIFF
Files: world-topo-bathy-200406-3x86400x43200.tif
Size is 86399, 43199
Coordinate System is:
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4326"]]
Origin = (-180.000000000000000,90.000000000000000)
Pixel Size = (0.004166666666667,-0.004166666666667)
Image Structure Metadata:
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
Lower Left  (-180.0000000, -89.9958333) (180d 0' 0.00"W, 89d59'45.00"S)
Upper Right ( 179.9958333,  90.0000000) (179d59'45.00"E, 90d 0' 0.00"N)
Lower Right ( 179.9958333, -89.9958333) (179d59'45.00"E, 89d59'45.00"S)
Center      (  -0.0020833,   0.0020833) (  0d 0' 7.50"W,  0d 0' 7.50"N)
Band 1 Block=256x256, ColorInterp=Red
Min=1.000 Max=10.000
Minimum=1.000, Maximum=10.000, Mean=1.615, StdDev=0.746
Overviews: 43200x21600, 21600x10800, 10800x5400, 5400x2700
Band 2 Block=256x256, ColorInterp=Green
Min=3.000 Max=30.000
Minimum=3.000, Maximum=30.000, Mean=8.209, StdDev=1.499
Overviews: 43200x21600, 21600x10800, 10800x5400, 5400x2700
Band 3 Block=256x256, ColorInterp=Blue
Min=12.000 Max=68.000
Minimum=12.000, Maximum=68.000, Mean=22.177, StdDev=2.873
Overviews: 43200x21600, 21600x10800, 10800x5400, 5400x2700

Take the uncompressed dataset (a whopping 15 gb!) and create the preferred format to “optimize space, quality and speed” using GDAL 1.8.1. You will see I have enabled all the optimal flags including ycbcr with average resampling

gdal_translate -of GTiff -co “TILED=yes” -co “PHOTOMETRIC=YCBCR” -co “COMPRESS=JPEG” world-topo-bathy-200406-3x86400x43200.tif gdal_compressed_world.tif

Input file size is 86399, 43199

0…10…20…30…40…50…60…70…80…90…100 – done.

gdaladdo -r average –config COMPRESS_OVERVIEW JPEG –config PHOTOMETRIC_OVERVIEW YCBCR –config INTERLEAVE_OVERVIEW PIXEL gdal_compressed_world.tif 2 4 8 16 32 64 128

0…10…20…30…40…50…60…70…80…90…100 – done.

Compression time was recorded with gdal_translate taking 8mins and gdaladdo 19mins taking total creation time to 27 minutes.

 Driver: GTiff/GeoTIFF
Files: gdal_compressed_world.tif
Size is 86399, 43199
Coordinate System is:
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4326"]]
Origin = (-180.000000000000000,90.000000000000000)
Pixel Size = (0.004166666666667,-0.004166666666667)
Metadata:
TIFFTAG_SOFTWARE=ERDAS IMAGINE
TIFFTAG_XRESOLUTION=1
TIFFTAG_YRESOLUTION=1
TIFFTAG_RESOLUTIONUNIT=1 (unitless)
AREA_OR_POINT=Area
Image Structure Metadata:
SOURCE_COLOR_SPACE=YCbCr
COMPRESSION=YCbCr JPEG
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
Lower Left  (-180.0000000, -89.9958333) (180d 0' 0.00"W, 89d59'45.00"S)
Upper Right ( 179.9958333,  90.0000000) (179d59'45.00"E, 90d 0' 0.00"N)
Lower Right ( 179.9958333, -89.9958333) (179d59'45.00"E, 89d59'45.00"S)
Center      (  -0.0020833,   0.0020833) (  0d 0' 7.50"W,  0d 0' 7.50"N)
Band 1 Block=256x256, ColorInterp=Red
Min=1.000 Max=10.000
Minimum=1.000, Maximum=10.000, Mean=1.615, StdDev=0.746
Overviews: 43200x21600, 21600x10800, 10800x5400, 5400x2700, 2700x1350, 1350x675, 675x338
Band 2 Block=256x256, ColorInterp=Green
Min=3.000 Max=30.000
Minimum=3.000, Maximum=30.000, Mean=8.209, StdDev=1.499
Overviews: 43200x21600, 21600x10800, 10800x5400, 5400x2700, 2700x1350, 1350x675, 675x338
Band 3 Block=256x256, ColorInterp=Blue
Min=12.000 Max=68.000
Minimum=12.000, Maximum=68.000, Mean=22.177, StdDev=2.873
Overviews: 43200x21600, 21600x10800, 10800x5400, 5400x2700, 2700x1350, 1350x675, 675x338

An equivalent ECW was then created using ERDAS Imagine from the same uncompressed TIF with a 20:1 target compression ratio. Compression / creation time as below.

For consistency, gdalinfo output is below.

Driver: ECW/ERDAS Compressed Wavelets (SDK 3.x)
Files: world-topo-bathy-200406-3x86400x43200-20x.ecw
Size is 86399, 43199
Coordinate System is:
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
TOWGS84[0,0,0,0,0,0,0],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,
AUTHORITY["EPSG","9108"]],
AXIS["Lat",NORTH],
AXIS["Long",EAST],
AUTHORITY["EPSG","4326"]]
Origin = (-180.000000000000030,90.000000000000014)
Pixel Size = (0.004166666666667,-0.004166666666667)
Corner Coordinates:
Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
Lower Left  (-180.0000000, -89.9958333) (180d 0' 0.00"W, 89d59'45.00"S)
Upper Right ( 179.9958333,  90.0000000) (179d59'45.00"E, 90d 0' 0.00"N)
Lower Right ( 179.9958333, -89.9958333) (179d59'45.00"E, 89d59'45.00"S)
Center      (  -0.0020833,   0.0020833) (  0d 0' 7.50"W,  0d 0' 7.50"N)
Band 1 Block=86399x1, ColorInterp=Red
Overviews: 43199x21599, 21599x10799, 10799x5399, 5399x2699, 2699x1349, 1349x674, 674x337, 337x168
Band 2 Block=86399x1, ColorInterp=Green
Overviews: 43199x21599, 21599x10799, 10799x5399, 5399x2699, 2699x1349, 1349x674, 674x337, 337x168
Band 3 Block=86399x1, ColorInterp=Blue
Overviews: 43199x21599, 21599x10799, 10799x5399, 5399x2699, 2699x1349, 1349x674, 674x337, 337x168

Storage requirements for both outputs,

-          JPEG Compressed TIFF (75%):  362,953 KB

-          ECW (20:1): 109,650 KB

So we have at least squashed one of the referenced claims. JPEG Compressed TIFF (with yCbCr) is approximately 3.5x and not just 30% larger for this dataset. Note that the storage requirements include the gdaladdo embedded overviews as you’d be surprised how many people do not factor these in..

As I wanted to investigate the decompression performance, I created additional ECW’s but this time with a much lower target ratio to try and get a file size similar to the JPEG TIFF. This in theory would make the disk I/O “fairer” in any subsequent performance test. It is however critical to note that we recommend a target ratio of 15:1 to 20:1 to retain visually lossless RGB imagery. Creating a file with such a small target will give minimal quality difference (take a look at the mp4 below) at the expense of storing and reading a lot more data.

-          ECW (6:1): 263,691 KB

-          ECW (3:1): 289,106 KB

Because of the way the compression algorithm works even with a very small target ratio the actual compression rate was still quite high due to the image having large water bodies that compress very well.

An additional JPEG compressed TIF was created with  –co “JPEG_QUALITY=90” as well as gdaladdo  JPEG_QUALITY_OVERVIEW to enhance the default 75% quality(see here).  With all other GDAL flags being equal this produced,

-          JPEG Compressed TIFF (90%): 587,439 KB

Now we have some data lets configure some tests. I will of course be testing ERDAS APOLLO IWS v11.0.2 on Windows 7 x64. Test hardware is a quad core, 8 thread Core i7 with 8gb RAM and a single 7200k attached disk.

I usually find a visual comparison helpful to further understand any subsequent JMeter metrics. After all, how much difference is 200ms, 50ms? To do this I created an OpenLayers website with synchronised maps and on each the loadend event is captured so a JS table can be displayed to give indicative performance values. Remember, this is not a load test and so system resources are not in contention.

MP4 screen recording: ecw-vs-tif-format-comparison (3 mins)

JMeter was configured to iterate through a small test plan of 100 WMS requests across the image in a single thread group to make a repeatable test. Unlike FOSS4G Benchmarking, the tests will all be cold-start as I hate with a passion “Warm” tests. Servers are restarted and then each thread group is run consecutively with 1 thread.  FYI, IWS uses internally the v4.2 ECW SDK as well as GDAL 1.7 for the TIF reading

Before analyzing the above I can already hear you thinking, damn your software just sucks reading TIF doesnt it!? You are skewing the results!

So the next step is independant verification. Without this, I would be no better than the other quotes sprouting 400% improvements compared with … *undisclosed*

  1. Grab the latest Geoserver v2.1.3 jetty build.
  2. Use the existing JDK 1.7 x64 version,
  3. Configure two new data/layers pointing to the same TIF datasets
  4. Enable JAI “JPEG Native Acceleration”
  5. Coverage Access / Queue Type: “UNBOUNDED”
  6. Suggested Tile size 512,512
  7. Default interpolation type: Nearest Neighbour

Note: This is just a ballpark comparison. No JVM tuning or any other settings were changed from default. Remember, IWS was also an out-of-the-box configuration so I paid Geoserver some extra attention :)

Both servers were then restarted and JMeter test plan refreshed. Results are as follows,

Label Samples Average Median 90% Line Min Max Error % Throughput KB/sec
IWS 4326 ECW 20:1 100 116 115 159 59 278 0 8.50557115 586.0675
IWS 3857 ECW 20:1 100 171 167 214 111 261 0 5.79038796 432.9373
IWS 4326 ECW 3:1 100 111 118 154 52 193 0 8.90234132 612.3722
IWS 3857 ECW 3:1 100 174 167 214 109 644 0 5.70678537 426.2425
IWS 4326 75% TIF 100 198 203 290 74 546 0 5.01781324 313.2798
IWS 3857 75% TIF 100 225 226 302 119 444 0 4.40858793 281.9394
IWS 4326 90% TIF 100 201 206 289 74 510 0 4.927322 311.4298
IWS 3857 90% TIF 100 228 237 299 126 611 0 4.35179947 286.429
Geoserver 4326 TIF 75% 100 201 179 245 94 1449 0 4.92926505 244.4857
Geoserver 3857 TIF 75% 100 289 252 511 100 1044 0 3.43902607 194.1044
Geoserver 4326 TIF 90% 100 194 169 238 103 902 0 5.1266277 261.2856
Geoserver 3857 TIF 90% 100 276 257 469 98 606 0 3.6101083 208.9514

So what does all this mean?

  • Firstly, IWS TIF throughput performance was very similar to Geoserver’s. I don’t really want to get into arguments with OpenGeo/Geosolutions on this as we could be here for days tuning and is not really the point of the post. For all intensive purposes one can adequately say the APOLLO IWS TIF implementation is not encumbered in any way
  • ECW outperforms JPEG compressed TIF by almost 2x
  • ECW 20:1 produces higher image quality than a TIF compressed at 90% JPEG quality
  • ECW 20:1 requires 5.3x less data storage than the 90% JPEG compressed TIF
  • The image quality difference between 3:1 and 20:1 ECW was trivial and not worth a 2.5x increase in storage.
  • A performance drop was not recorded with the varying compression levels, regardless of format. This is expected given the low concurrency
  • ECW  20:1 compression/creation time is 1.5x faster than 75% JPEG compressed TIF

Summary

ECW when compared with the “JPEG compressed tiled Geotiff with embedded overviews” alternative format,

  • Can be created quicker (compress)
  • Requires 5x less disk storage
  • Retains higher image quality
  • Serves output imagery 2x faster (decompress)
  • Requires a ECW write and server license from ERDAS

For many of our customers with hundreds of terabytes and even petabytes of image data, the business justifications are all there for license aquisition. Whether it be talking to your IT area who has to manage increasing SAN costs, the data capture area who want to ensure quality is retained, or the end customer who just wants the imagery served to them as quickly as possible. The license fees are what we determine to be fair market price, but unfortunately many would prefer to ignore the benefits (or pretend they didnt exist) and in the end cost their organization more money in the process.

I suspect this post will attract the usual suspects, but if i can leave one thing in your minds it would be the following tweet from me,

Take-aways

  1. Wavelet based formats ECW, JPEG2000, MrSID should never be tainted with the same “wavelet” brush and grouped together. They do not perform the same and makes about as much sense as me grouping the variety of compression and structure options available for GeoTIFF and just saying “Geotiff is slow”
  2. If your current server software does in fact perform slower reading ECW then that is likely an architectural constraint or a poor implementation of the ECW SDK. If you are still using a DCOM multi-process based software then your usage will vary. Fear not though, there is a better solution out there :)

Now wheres my popcorn. It’s nice to post again ..

3 thoughts on “Mythbusting ECW decompression

  1. Why is ECW bad ?

    Nice post! :)

    One thing that is very hard to quantify is,

    how much does using open-source formats (Tiff, JP2000) make decision-makers feel good ??

  2. Paul Ramsey

    I’s dotted, T’s crossed, very enjoyable.

    Now, you did your tests without load contention, and I’m wondering if some of the mythology around wavelets is related to the fact that they really do (did?) slam the CPU. Certainly I found that Back In The Day. Or maybe things just have improved a lot since Back In The Day.

    However, even if they haven’t improved (and that’s unlikely), one independent thing that’s changed a great deal since Back In The Day is that the relative price of cores compared to I/O has gone down down down, so I think it’s time to re-evaluate the wavelet value proposition. Cause if I have 64 cores at my disposal (and these days, who doesn’t?), a little extra CPU time is not the end of the world.

    Thanks for the great post.

  3. Chris Tweedie

    Hey Paul thanks for the comments,

    Indeed CPU time is not the end of the world; especially when it is offset by reduced disk I/O and increased RAM utilisation. As you point out the increase in cores/clock speed/SSE instructions has meant the value proposition has vastly improved and certainly not gone backwards. Anyone who says otherwise well … i’m not sure.

    Slamming the CPU kind varies. Certainly JPEG2000 is more expensive than ECW and I’m not really in the position to comment on MrSID. Which is primarily why people need to stop grouping the formats together as usage always will vary

    The circular argument on wavelet compression usually hinges these days around “but storage is cheap now”. Sadly that just goes back to my swallowing one-liners comment as that’s never been the only advantage

Comments are closed.