Wednesday 19 October 2016

Retrieving the top requesting hosts from the nginx access logs

We will fristly inspect the log format:

tail -f /var/log/nginx/access.log.1

89.248.160.154 - - [18/Oct/2016:21:58:38 +0000] "GET //MyAdmin/scripts/setup.php HTTP/1.1" 301 178 "-" "-"
89.248.160.154 - - [18/Oct/2016:21:58:38 +0000] "GET //myadmin/scripts/setup.php HTTP/1.1" 301 178 "-" "-"

Fortunately apache has a standardized format so we can parse the logs pretty easily - we will firstly use a regex to extract the requester IP from the log file (note the '^' is present to ensure we don't pickup the IP anywhere else e.g. the requested URL.):

grep -o '^[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' /var/log/nginx/access.log.1

We then want to remove any duplicates so we are presented with unique hosts and ideally sort these:

grep -o '^[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' /var/log/nginx/access.log.1 | uniq | sort

Now we can use a while loop to retrieve the results:

#!/bin/bash

input='/var/log/nginx/access.log.1'

grep -o '^[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' $input | uniq | sort | while read -r line ; do

count=$(grep -o $line $input | wc -l)

echo "Result for: " $line is $count

done


But you might only want the top 5 requesters - so we can expand the script as follows:



#!/bin/bash

input='/var/log/nginx/access.log.1'

grep -o '^[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' $input | uniq | sort | while read -r line ; do

count=$(grep -o $line $input | wc -l)

# Bash creates a subshell since we are piping data - so variables within the loop will not be available outside the loop.

echo $count for $line >> temp.output

done

echo "Reading file..."

for i in `seq 1 5`; do

cat temp.output | sort -nr | sed -n $i'p'

done

# cleanup
rm -f temp.output

0 comments:

Post a Comment