Transparent Web Proxy with Antivirus Check and URL Blacklisting

The purpose of this document is to describe the creation of a Web Proxy with antivirus check of web pages and site blacklisting/whitelisting. The document is divided into the following sections:

Why use a web proxy with antivirus?

Web pages are more and more frequently the means by which worms and viruses are spread on the Internet. Websites, whether intentionally or because they are vulnerable and are therefore modified without the knowledge of the legitimate authors, sometimes have executable code references that can infect users’ computers. Moreover, the situation has worsened since a number of vulnerabilities in the image display system has allowed viruses to be carried in JPEG files. Lastly, the growing use of Java applets is increasing the number of multiplatform viruses spread via http and operating regardless of the platform (PC, palmtop, mobile phone) or operating system on which they work.
The best solution for this type of problem is to provide all client devices that connect to the internet with a good antivirus program with real-time protection, checking every single incoming file. However, this may not be enough for two reasons: no antivirus program, even those having signature self-updating mechanisms, can provide a 100% guarantee against every virus; real-time check of content entering is considerably burdensome in computational terms and particularly on devices whose performance is not too good, it can slow down the system to the point of making users disable antivirus real-time protection.
For these reasons, virus check is increasingly done upstream, before potential viruses are able to reach the user’s client. In other words, centralized antivirus systems are used on servers offering a particular service. The most widespread example is that of e-mail servers, which have a system that analyze incoming and outgoing messages via SMTP and scan attachments for viruses. In this case, application of antivirus check on an SMTP gateway is quite natural, since e-mails are obliged to pass through it, before reaching the user’s inbox. For the http service, this is not so insignificant, since a LAN client may potentially connect directly to any of the web servers available on the Internet. The solution to this problem involves introducing an application-level gateway to the LAN to collect client http requests and forward them to the relevant web servers. This application gateway is called a Web Proxy and since it is capable of interpreting the http protocol, it not only filters on the basis of URLs, but also breaks down the content being carried (HTML, JavaScript, Java Applet, images, …) and scans it for viruses. One of the most common functions of proxies so far has been web caches, that is, the archiving on disk of web pages that have already been visited, in order to accelerate display of the same URLs for later requests. The purpose of this is also to reduce bandwidth consumption on the Internet and one of the best-known proxy systems, capable of performing web cache functions is Squid, distributed with Open Source license.
Zeroshell does not integrate Squid since it does not collect web pages. The task of a web-focused antivirus programs and content filtering, using URL blacklists, is handled by HAVP as proxy system and ClamAV as antivirus software. Both are distributed under GPL.

Transparent Proxy Mode

One of the biggest problems when using a proxy server is that of configuring all web browsers to use it. It is therefore necessary to specify its IP address or host name and the TCP port on which it responds (usually port 8080). This could be burdensome in the case of LANs with numerous users, but even worse, it might not guarantee against users removing this configuration to gain direct access to the web, thus avoiding antivirus check, access logging and blacklists.
To solve this problem, Zeroshell uses Transparent Proxy mode which involves automatically capturing the client requests on TCP 80 port. Obviously, for Zeroshell to be able to capture these web requests, it must be configured as a network gateway, so that client Internet traffic goes through it. Zeroshell will automatically capture http requests whether this is a level 2 gateway (bridge between Ethernet, WIFI or VPN interface) or layer 3 gateway (router). It is nevertheless important to specify on which network interfaces or IP subnets these requests are to be redirected. This is done by adding so-called HTTP Capturing Rules as shown in the figure below:

Configuration of the Proxy Capturing Rules
Configuration of the Proxy Capturing Rules

In the example in the figure, http requests from ETH00 and ETH03 network interfaces are captured. Excluded from these requests are those directed at web servers belonging to the IP subnet and those from the client with the IP address. There may be several reasons why it is necessary to exclude the intervention of the transparent proxy on some clients and some web servers. For example, one web server may restrict access only to clients with a certain IP on its ACLs. In this case, if the proxy captured requests to the above server, it would be reached via its IP and this would prevent access. On the other hand, it would not be possible to authorize the IP address of the proxy on the web server’s ACLs, since this would mean allowing indiscriminate access to all clients using the proxy. It is clear, then, that the only solution is to avoid the capture of requests by the transparent proxy.
Lastly, note that the iptables rules to redirect towards the proxy service (8080 tcp) are placed downstream of those intervening on the Captive Portal. Thanks to this, Captive Portal and Transparent Proxy can be enabled simultaneously on the same network interface.

Configuration and activation of the proxy service

As illustrated in the figure below, configuration of the proxy service with antivirus check is very simple. After configuring the Zeroshell box to act as a router and after configuring it on the clients as the default gateway, or configuring it as a bridge and interposing it on a point of the LAN at which traffic flows to and from the Internet, simply enable the flag [Enabled] so that the proxy can start working. As mentioned in the previous paragraph, the web requests that are actually intercepted and submitted to the proxy are those specified through configuration of the [HTTP Capturing Rules].

HTTP Proxy Configuration Web Interface
HTTP Proxy Configuration Web Interface

Note that, start-up of the proxy service is very slow compared to other services, and on hardware that is not very fast it can take up to 30-40 seconds. This is due to the need of the ClamAV antivirus libraries to load and decrypt a large number of virus signatures in their memory. To prevent this from blocking the web configuration interface and start-up scripts for long intervals, the service is started asynchronously. Hence, when the proxy is enabled or reconfigured, the Status item is not displayed as ACTIVE (green) immediately, but first passes from the STARTING state (orange) which shows that the service is loading the signatures. To understand when the proxy actually starts performing, click on [Manage] to reload the configuration page, or simply click on [Proxy log] to view the havp daemon’s start-up messages. During the start-up period of the havp daemon, the iptables rules to capture http requests are temporarily removed, allowing web traffic to flow regularly, but without being scanned for viruses.
A few configuration items are analysed in more detail in the following paragraphs.

Access log and privacy

Being an application gateway capable of interpreting http requests, in order to work correctly, a web proxy decrypts the URLs visited by users. By default, Zeroshell does not send this information to the system logs, which, if associated with the IP address of the clients requesting web pages, can help to trace the content visited from the users.
Nevertheless, the logging of this information can be enabled by modifying the item [Access Logging] from “Only URL containing Virus” to “Any Access”. By doing this, each URL visited is recorded in the log associated with the client’s IP address. It is necessary, before enabling this option, to consult local legislation in your country to verify that logging of the URLs visited is not against national privacy laws.
Moreover, it is important to be aware that, as enabling the NAT on an Internet access router, each client external request is made by the router itself, in the same way http requests passing through a proxy appear to be made from the IP address of the proxy server. This may cause difficulties in tracing the identity of a user who has performed illicit actions on remote servers. A possible solution to this problem, which is less invasive in privacy terms, could be to activate logging of the Connection Tracking (from the Zeroshell web interface [Firewall][Connection Tracking]). In this way, any TCP/UDP connection is recorded in the logs showing the source IP, source port, destination IP and destination port. Hence, it will not be possible to track the content of user activity, but a trace will be kept of connections made. Again, in this case it is necessary to consult local legislation before enabling connection tracking.

Antivirus check of images

For a long time it was thought that a file containing a JPEG or GIF image could not contain a virus, because it is simply made up of data formatted in a preset format, interpretable by the viewing system of the operating system. Recently, however, some image rendering components have shown that they are vulnerable if they are not updated with patches. A suitably constructed image could create a Buffer Overrun and execute arbitrary code on the system. It is easy to understand the seriousness of this, given that most hypertext content on the WWW is in image form.
The HAVP proxy configured in Zeroshell, by default scans images using the ClamAV antivirus program. Nevertheless, on slow hardware, the scanning of images could delay the opening of web pages with many images. In this case it possible to disable the scanning of files containing images, by setting the [Check Images (jpg, gif, png)] option from “Enabled” to “Disabled”

Automatic update of ClamAV signatures

The speed with which new viruses are put on the internet and identified, means that antivirus signatures are increased and are modified frequently. The ClamAV database is no exception, which, thanks to the freshclam daemon, can be updated online at regular intervals.
Zeroshell configures freshclam by default to check the signature database 12 times a day. This interval can be set using the [Number of Checks per Day] parameter, from a minimum of 1 to a maximum of 48 checks per day. It is also important to set the [Country of the Mirror] correctly, through which freshclam chooses the nearest site from which to download the virus signatures. Note, however, that regular updating is a fast operation which does not generate much traffic, since a differential update system is used.

Website blacklisting and whitelisting

It is often necessary to block the display of a number of websites since their content is considered unsuitable for the users of the web service. An example is adult-only material, which should not be displayed on computers to which children have access. One very effective solution for this problem is forcing web clients to access the Internet through a proxy, which, through Content Filtering software such as DansGuardian, examines the content of html pages blocking those thought to belong to an undesired category. The mechanisms of these filters can be compared to those of antispamming systems. Unfortunately, however, it is not clear whether the DansGuardian release licence is compatible for integration within a system such as Zeroshell and, hence, it was not used in order avoid the risk of licence violation.
At the moment, the only way to block or allow display of web pages is the blacklisting and whitelisting of web pages as shown in the figure.

Configuration of the Web Proxy Blacklist
Configuration of the Web Proxy Blacklist

Blacklists and whitelists consist of a sequence of URLs arranged on distinct lines. Each line may correspond to several web pages when the * character is used. To block the site place* on the blacklist, whereas the line, without *, would only block the home page of that site.
The whitelist has priority over the blacklist. In other words, if a web page corresponds to a blacklist item and, at the same time, is found on the whitelist, access is allowed to the page.
Moreover, note that the purpose of the whitelist is not only to allow access to pages that would otherwise be prohibited by the blacklist, but also to bypass antivirus check. Please take careful note of this.
If the LAN administrator wants to adopt the policy of providing access to a limited number of sites, s/he can specify the */* line in the blacklist, which will prevent access to all pages except those included on the whitelist.

Testing proxy and antivirus function

There are basically be two reasons why the proxy might not work correctly. First of all, it is necessary to ensure whether the Zeroshell box is configured as a router or a bridge, and also that traffic to and from Internet actually goes through it. Secondly, you must be certain of the correct configuration of the [HTTP Capturing Rules], which determine which http requests are actually redirected towards the proxy process (havp listens on In particular, if http request capture is imposed on a network interface that is part of a bridge, you must be sure that at least one IP address has been defined on the latter.
The easiest way to check whether the proxy is working correctly is to temporarily enable logging of all accesses and display the proxy log after requesting the web pages of a client.
Once certain that the proxy captures the web requests as expected, check that the ClamAV antivirus software is working correctly. To do this, first check on the freshclam logs that the signatures are updated regularly. Then, go to the URL to check whether the EICAR-AV-Test test virus (said to be harmless by the authors) is captured and blocked.
Lastly, note that the proxy cannot serve https requests (http encrypted with SSL/TLS) given that, not having the private key of the web server, it cannot decrypt the content and the URLs of this request encapsulated in encrypted tunnels.