Wednesday, May 4, 2011

What is Proxy and Reverse Proxy?

Web Proxies :

         A proxy server is a gateway for users to the Web at large. Users configure the proxy in their browser settings, and all HTTP requests are routed via the proxy. Proxies are typically operated by ISPs and network administrators, and serve several purposes: for example,

* to speed access to the Web by caching pages fetched, so that popular pages don't have to be re-fetched for every user who views them.
* to enable controlled access to the web for users behind a firewall.
* to filter or transform web content.

Reverse Proxies :

          A reverse proxy is a gateway for servers, and enables one web server to provide content from another transparently. As with a standard proxy, a reverse proxy may serve to improve performance of the web by caching; this is a simple way to mirror a website. Loadbalancing a heavy-duty application, or protecting a vulnerable one, are other common usages. But the most common reason to run a reverse proxy is to enable controlled access from the Web at large to servers behind a firewall.

The proxied server may be a webserver itself, or it may be an application server using a different protocol, or an application server with just rudimentary HTTP that needs to be shielded from the web at large. Since 2004, reverse proxying has been the preferred method of deploying JAVA/Tomcat applications on the Web, replacing the old mod_jk (itself a special-purpose reverse proxy module).

A Reverse Proxy Scenario:

           Company example.com has a website at www.example.com, which has a public IP address and DNS entry, and can be accessed from anywhere on the Internet.

The company also has a couple of application servers which have private IP addresses and unregistered DNS entries, and are inside the firewall. The application servers are visible within the network - including the webserver, as "internal1.example.com" and "internal2.example.com", But because they have no public DNS entries, anyone looking at internal1.example.com from outside the company network will get a "no such host" error.

A decision is taken to enable Web access to the application servers. But they should not be exposed to the Internet directly, instead they should be integrated with the webserver, so that http://www.example.com/app1/any-path-here is mapped internally to http://internal1.example.com/any-path-here and http://www.example.com/app2/other-path-here is mapped internally to http://internal2.example.com/other-path-here. This is a typical reverse-proxy situation.

Load following Apache Proxy Modules :

* mod_proxy: The core module deals with proxy infrastructure and configuration and managing a proxy request.
* mod_proxy_http: This handles fetching documents with HTTP and HTTPS.
* mod_proxy_ftp: This handles fetching documents with FTP.
* mod_proxy_connect: This handles the CONNECT method for secure (SSL) tunnelling.
* mod_proxy_ajp: This handles the AJP protocol for Tomcat and similar backend servers.
* mod_proxy_balancer implements clustering and load-balancing over multiple backends.
* mod_cache, mod_disk_cache, mod_mem_cache: These deal with managing a document cache. To enable caching requires mod_cache and one or both of disk_cache and mem_cache.
* mod_proxy_html: This rewrites HTML links into a proxy's address space.
* mod_xml2enc: This supports internationalisation (i18n) on behalf of mod_proxy_html and other markup-filtering modules. space.
* mod_headers: This modifies HTTP request and response headers.
* mod_deflate: Negotiates compression with clients and backends.

Most important are mod_proxy_balancer, mod_cache, mod_disk_cache, mod_mem_cache, mod_deflate

Building Apache for Proxying :

Use options during compiling apache using source code :

$ ./configure --enable-so --enable-mods-shared="proxy cache ssl all"

Using apxs tool on existing apache installation :

apxs -c -i [module-name].c

Configuring the Proxy :

Load following modules in http.conf :

LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so
#LoadModule proxy_ftp_module modules/mod_proxy_ftp.so
#LoadModule proxy_connect_module modules/mod_proxy_connect.so
LoadModule headers_module modules/mod_headers.so
LoadModule deflate_module modules/mod_deflate.so
LoadFile /usr/lib/libxml2.so
LoadModule xml2enc_module modules/mod_xml2enc.so
LoadModule proxy_html_module modules/mod_proxy_html.so

The fundamental configuration directive to set up a reverse proxy is ProxyPass. We use it to set up proxy rules for each of the application servers in the httpd.conf file on the webserver:

ProxyPass /app1/ http://internal1.example.com/
ProxyPass /app2/ http://internal2.example.com/

       However, this is not the whole story. ProxyPass just sends traffic straight through. So when the application servers generate references to themselves (or to other internal addresses), they will be passed straight through to the outside world, where they won't work. The proxy needs to re-map the Location header to its own address space and return a valid URL. The command to enable such rewrites in the HTTP Headers is ProxyPassReverse. The Apache documentation suggests the form:

ProxyPassReverse /app1/ http://internal1.example.com/
ProxyPassReverse /app2/ http://internal2.example.com/

No comments:

Post a Comment