Thursday, April 20, 2006

http compression

One of the things I also use regularly at work is wget. In order to work on compression I needed a tool that would recursively spider a web site and had the appropriate features, but would also support compression. At the time, nothing out there supported this, so a year ago or so I finished a patch to wget to add compression.

You can check out the latest version of wget using the subversion repository outlined here. Apply this patch from the top directory and as long as your OS has zlib support, then you should be able to use the '-z' switch this patch adds to request compressed files from a webserver.

wget-zlib-04202006.patch

Here's an example how a 39K download turns into a 10K download with compression, and downloads in half the time.


$ ./wget http://172.16.17.1/index.html
--13:22:32-- http://172.16.17.1/index.html
Connecting to 172.16.17.1:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39599 (39K) [text/html]
Saving to: `index.html'

100%[=======================================>] 39,599 --.-K/s in 0.005s

13:22:32 (8.39 MB/s) - `index.html' saved [39599/39599]

$ ./wget -z http://172.16.17.1/index.html
--13:22:39-- http://172.16.17.1/index.html
Connecting to 172.16.17.1:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10546 (10K) [text/html]
Saving to: `index.html.1'

100%[=======================================>] 10,546 --.-K/s in 0.002s

13:22:39 (5.62 MB/s) - `index.html.1' saved [10546/10546]

$ diff index.html index.html.1
$ echo $?
0

4 comments:

  1. I switched to curl for my compressed traffic tests. curl -v is more useful than wget, too.

    ReplyDelete
  2. Ahh, but the recursive retrieval of links is the clincher. wget FTW!

    ReplyDelete
  3. True dat. So, did you submit the patch?

    ReplyDelete
  4. http://article.gmane.org/gmane.comp.web.wget.patches/1630

    ReplyDelete