Tuesday, 25 December 2012

Autoproxy: How it works, why it sucks and why transparent proxy is so much better

Most sites I visit have some form of proxy in place. Most of the time it is pretty basic with the the proxy details either setup in the SOE or pushed out (to Windows clients) via Group Policy. In a typical scenario there will be two proxies: one requiring authentication and one that doesn't. The unauthenticated proxy is used by devices, IT staff and any user that runs some app that can't handle the authenticated proxy for some reason. The changes (when required) are performed manually in most cases.

Considering the automation available to us, why is this process done manually? All major browsers support the ability to perform autoproxy, yet very few sites implement it.

I'm only going to cover the basics of autoproxy here. This is essentially a 5 minute guide to get it up and running.

Firstly, the Proxy Auto Config (PAC) file is simply an http-delivered script that tells your browser which proxy to use under specific situations. It can be anything from mind-numbingly simple to extremely complex.

At the most basic level, you can type the location of the PAC script into your browser options. Each browser locates that in a different place. However in my opinion, this defeats the purpose of autoproxy in the first place - you may as well just type in the proxy address.

The Web Proxy Auto-Discovery protocol (WPAD) uses either DHCP or DNS to discover the PAC script. The browser will first request option 252 from the DHCP server. This field must be populated in DHCP with the URL of the PAC script. DHCP has the highest priority and if present, DNS will not be used.

This brings us to the first problem: Firefox does not support the DHCP method.

Oh, and one gotcha with the DHCP method: Internet explorer expects the string to be null terminated. If not it will strip off the last octet for you. Try troubleshooting that one!

If DHCP fails (or is not used) then DNS is used for WPAC. This is simply a DNS lookup for "wpad.domainname". If not found it walks the DNS tree until a reference is found or the lookups are exhausted. If an entry is found, it attempts to load a wpad.dat file from the reference. For example, for the local domain "department.branch.company.internal.net" the successive lookups will be:

http://wpad.department.branch.internal.net/wpad.dat
http://wpad.branch.internal.net/wpad.dat
http://wpad.internal.net/wpad.dat
http://wpad.net/wpad.dat

This leads us to the second problem: security. If a site is not careful, "wpad.net" can resolve externally and a malicious PAC script can be executed on the browser. This is usually the case with notebooks taken off-site. 

The web-server location referenced by wpad.dat should be a virtual host redirected to a proxy.pac file. In the case of apache this is done simply with the following lines in httpd.conf:

Redirect permanent /wpad.dat /proxy.pac

and

AddType application/x-ns-proxy-autoconfig .dat

Finally we are at the proxy.pac script. This script is basically a simplified for of javascript designed to run on browsers that runs that implements a single function called FindProxyForURL(). There are a limited number of additional built in functions and you can also write your own. At its simplest, a proxy.pac file will be:


  function FindProxyForURL(url, host)
   {
      return "PROXY proxy.example.com:8080; DIRECT";
   }


For most organisations, this will probably be enough. However, a more complex script may be needed. For example:


   function FindProxyForURL(url, host) {
      if (isInNet(host, "10.0.0.0",  "255.255.248.0"))
      {
         return "PROXY fastproxy.example.com:8080";
      }
      return "PROXY proxy.example.com:8080; DIRECT";
   }


The above example enables the proxy location to change according to the subnet used.

This brings us to our next problem: the isInNet() function can be completely unpredictable on windows clients if the .net 2.0 framework is loaded. The MyIpAddress() function can also be unpredictable ff you have more than one adapter, the function could return either either IP address, or it could even (under certain circumstances) return 127.0.0.1.

In fact, your proxy.pac is conditional upon the local environment including any limitations of the javascript engine.

The irony is that in the environments where autoproxy is most useful, it is most likely to be unpredictable and simply not work for many clients. It is also very difficult to troubleshoot.

There are many sites dedicated to enabling you to write the perfect proxy.pac script. There are tips to trap all the vagaries listed plus dozens more. They also detail ways to debug your script. If maintaining a long and complex script that deals with more exceptions than rules is right up your alley then go for it! However, for me this just indicates that autoproxy simply sucks and should be avoided in all but the simplest of circumstances.

Which brings me to the concept of a Transparent Proxy. Implementing this is simplicity itself. All you need to do is runup the following on a spare server with a single network card. It doesn't have to be powerful:

1) Centos Linux (preferably)
2) Squid Proxy
3) Shorewall firewall
4) Webmin (for administration)

Set the squid proxy to listen on ports 80 and 443. You can run two instances.

Setup the shorewall firewall to redirect all non-proxy traffic to the router.

Setup DHCP to set the default route to be the linux server.

That's basically it! There are variations, but this is the nuts and bolts of it. Because the server is the default route and the proxy is listening on http and https ports, it will proxy transparently.

There is also another (newer) way of doing this using TPROXY which performs the transparent proxy at layer 3. I have never done this, because it looks a lot more complicated but more info is available here.

No comments:

Post a Comment