Here's a little bit of information on how I've integrated Squid into our Samba fileservers at work.
Previously, we had NetWare servers running BorderManager. We were attempting to use BM to provide content filtering and access control to the internet. However, like most of our other experience with NetWare, this proved less than stellar.
After we migrated our servers to Linux and Samba, we were left without a content filtering solution. We briefly spent time trying to make the SonicWall Content Filter service work, but the licensing costs proved prohibitive, and the hardware was underpowered for the task at hand.
I finally convinced the powers-that-be to let me try Squid at one of our sites. I was originally going to set it up with transparent proxying, but we really wanted access control, and authentication doesn't work with a transparent-proxy setup. So, I set up a traditional authenticated proxy setup, and used System Policies to force our workstations to use it. Additionally, I blocked port 80 outbound on the firewall, so that users cannot bypass the proxy. This worked so well that I deployed it to all of our sites, running it on the Samba server to save hardware costs.
I configured Squid to use NTLM authentication, with a fallback to basic auth. IE and Firefox both support NTLM authentication, but IE on Windows will transparently hand out the user's credentials when asked by the proxy server. This means that the proxy is effectively transparent to our users.
I tried using the Squid NTLM auth helper to provide me with group-based access control, but the helper I ended up using was too primitive for that. (The other helper required a winbindd server on each proxy, which is something I wasn't willing to set up at the time.) So I used the NTLM helper to provide me with a username, and then I wrote an external authentication helper that checks if a user is in a particular group. We back our Samba servers with an LDAP directory, so the helper simply calls ldapsearch for a particular group, and checks to see if the user is in that group. This allows me to provide different levels for students and staff, for example.
The user_in_group script looks like this:
#!/bin/sh
while read LINE; do
USER=`echo $LINE | awk '{print $1}'`;
GROUP=`echo $LINE | sed s/$USER\ //`;
if ( ldapsearch -x -b dc=example,dc=net "(&(cn=$GROUP)(memberUid=$USER))" memberUid | grep -i numentries >/dev/null); then
echo OK;
else
echo ERR;
fi;
done;
We also needed a way for staff to be able to turn off all internet access to a given group of machines (a computer lab, for example.) To accomplish this, I used a two-stage process. Stage one involves the user running a script that creates a file in a particular directory on the Samba server. Stage two sees Squid call an external authentication helper that checks the list of filenames in that directory against the NetBIOS name of the machine that is requesting access through the proxy. The filenames are treated as regular expressions for the comparison, so I can block access to all machines starting with QSS229, for example. With short TTLs on these ACL entries, I can provide staff with the means to immediately turn on or off internet access in a given room.
The netbios_auth script looks like this:
#!/bin/sh
while read LINE; do
IP=$LINE;
HOSTNAME=`nmblookup -U winsserver -R -A $IP | grep -m1 "<00>" | awk '{print $1}'`;
BLACKLISTED=0;
for i in `ls /home/samba/netlogon/squid_blacklist/`; do
if ( echo $HOSTNAME | grep -i $i >/dev/null ); then
BLACKLISTED=1;
fi;
done;
if [[ $BLACKLISTED -eq 1 ]]; then
echo "ERR";
else
echo "OK";
fi
done
The script gets the NetBIOS name of the machine by doing a reverse WINS lookup on the IP requesting access, then uses the filenames in /home/samba/netlogon/squid_blacklist as REs to match against the name. If any match, the access is denied.
The above controls allow me to create whitelists and blacklists of URLs. For most of these, I use the url_regex parameter in Squid, so for example, I can put youtube in the list and block any request that contains the string "youtube" in the URL. (like www.youtube.com, or static.youtube.ca, etc.)
Because our proxies are distributed across many servers, I have set up two sets of lists, one that is global and replicated from our master server, and the other that is local and unique to each server.
After trying and discarding many Squid logfile analysis tools, I finally settled on MySAR. MySAR parses the logfiles into a MySQL database, and then the reporting web-frontend draws information from that database. This solution works quite well, but I'm still tuning our database server to contend with the remarkable amount of data. (currently there are over 35 million rows in the largest table used by MySAR. This is owing to the fact that I'm collecting logs from twenty Squid servers, and our traffic ranges from 15GB to over 100GB on busy days.
I want to look into cache peering, so that I can further reduce our internet traffic.
Recent comments
5 weeks 2 days ago
15 weeks 2 days ago
15 weeks 2 days ago
15 weeks 2 days ago
15 weeks 2 days ago
15 weeks 2 days ago
17 weeks 17 hours ago
17 weeks 2 days ago
20 weeks 3 days ago
20 weeks 6 days ago