Looking for grep, sed or awk assistance.

  • Thread starter Thread starter r31griffo
  • Start date Start date
R

r31griffo

G'day everyone,

I'm really stuck on this one, Google doesn't seem to know the answer...unless I'm asking the wrong question.

First a bit of background (if you're interested):
I've inherited a network Windows network that currently utilises a proxy provided by the ISP and enforced by in the domain by Group Policy .PAC file, it's difficult to manage as many users are local admins and can install a different browser which isn't covered by the GPO (engineers can be crafty little buggers!)
Anyway, I've figured that a DNS filtering system will solve many of the current issues. Ideally OpenDNS would have been great but in Australia we seem to get the wrong servers IP addresses for CDN networks and suffer a performance hit because of it...so I sat back and thought about it, there's no reason I shouldn't be able to make my own filtering service similar to OpenDNS but resolve names using Australian DNS servers to avoid the CDN issues.

The design:
The DNS filter is a very simple but effective design, I'm using DNSMASQ as a DNS forwarder but with a host file of domains that resolve to 127.0.0.1 (the address will change when I'm finished to point to itself and I'll have a webserver record the IP address, time and domain it was attempting to access to catch the naughty people). The blacklist is created via a bash script run as a daily cron job which downloads a DNS content list, extracts the chosen blacklist category's domains and creates the hosts file...then restarts DNSMASQ which reads the hosts file on startup.

The AD DNS servers will point to my new DNS forwarder so all external requests will be filter automatically and to stop people from resolving off an external DNS server I will block outbound DNS for all IPs except the forwarder.
The blacklist categories and whitelist domain files are created through a basic PHP page with a form, it loops through the categories in the DNS content list and shows checkboxes for each, there is also a textarea for the whitelist...when you click submit, magic happens!
The DNSMASQ instance for about 1.5 million domains was just under 100MB of RAM and it resolves names instantly, it may even feel like a faster internet connection by the time I'm finished.



The problem:
The hosts file (blocked domains) looks like IP address 'tab' domain name:
127.0.0.1 domain.com
127.0.0.1 domain.com.au
127.0.0.1 domain.co.nz
127.0.0.1 domain.co.uk
etc

After that file is created I'm attempting to remove a list of whitelisted domains
domain.com
domain.co.nz
etc

The issue I'm having is everytime I attempt to filter out 'domain.com', it removes 'domain.com.au' as well. Here are just a few of the things I've tried:

# Loop through each line in the whitelist file (quite inefficient), by the end hosts.new didn't have any entries...
cat /var/www/blacklist/whitelist.txt | while read -r remove ; do
grep -v $remove /tmp/hosts.new >> /tmp/hosts.tmp
mv /tmp/hosts.tmp /tmp/hosts.new
done



# Used as a pattern file
grep -vf /var/www/whitelist.txt /tmp/hosts.tmp > /tmp/hosts


# I had high hopes for this one, the idea is grep is run once with each whitelisted domain in the command delimited by a | and with the dots escaped, this example needed eval or it wouldn't work at all:
# in this example, $removelines=domain.com|domain.com.au (I've got no idea about sed, I really didn't understand the need for so many slashes but the idea was to escape every dot)

removelines_escaped=`echo $removelines | sed "s/\./\\\\\\\./g"`
eval grep -vwE '`echo ${removelines_escaped}`' /tmp/hosts.tmp > /tmp/hosts



Anyway, I'm keen to know your thoughts and suggestions on how this can be made better but most importantly if there's anything I might be able to do to filter out the whitelisted domains.

Cheers,

Griffo

Continue reading...
 
Back
Top