Tag Archives: Analytics

AWStats Setup on CentOS

Introduction

AWStats is a great tool for gathering statistics about your website. It acquires everything it needs to know about your site strictly through your websites log files. AWStats is able to scan through these logs line by line and present them in a fantastic report. This report can really help you make strategic decisions going forward as well as spot any anomalies that might be taking place. The tool is smart enough to only scan newer log entries (from when it last ran) allowing you to run it again and again (as often as you want). Thus, once you set this tool up to run daily (or even hourly), you’ll have detailed statistics about your website you can call upon anytime.

AWStats collects information such as such as:AWStats Report

  • Who is visiting your site.
  • How many visitors you’re getting daily.
  • Where are they’re coming from (did a site link to you?)
  • Where is the visitor from (geographical location
  • … and on and on
  • The presentation of these collected statistics can be either via a website (HTML), XML and/or as a PDF file. The PDF is especially useful since it combines all of the multiple HTML pages (as presented) into one great big report with a table of contents and hyperlinks throughout it! The PDF is also really easy to navigate and pass along to others who might also be interested.

    Why Use AWStats over Google Analytics?

    Google Analytics Inaccuracy
    Google Analytics Inaccuracy
    The number one reason is because AWStats is much (,much) more accurate! AWStats also just works without ‘any’ changes to your website (literally – none at all). Google Analytics however requires you to add a small piece of JavaScript to every web page you want to track. Every time this tiny bit of JavaScript code executes, it passes the information along to Google. The problem is… if that little snippet of JavaScript doesn’t execute, then Google doesn’t track that user (and you’ll never know) because it just won’t get reported.

    It’s really easy to prevent this chunk of JavaScript from running too, you just have to have installed something like Ad-Blocker Plus, Disconnect and/or uBlock into your Web Browser (such as Firefox or Chrome). These plugins specifically block these tracking techniques and eliminate most (if not all) advertising the website might have too.

    It doesn’t mean that online analytic tools (like Google Analytics) are not good; no, not at all! But it’s just important to understand that they can’t (and truly aren’t) reporting everything that’s going on with your website and the traffic generated from it.

    Another point worth mentioning is that Google Analytics can not monitor and report statistics on traffic used by third party tools. Therefore you can’t use it to monitor any RESTful API services because the programs accessing it will never call these JavaScript snippets of code.

    It’s worth pointing out now that if you use AWStats, you’ll have the full picture! You’ll be able to easily identify any anomalies and detect certain forms of malicious intent! You’ll be able to monitor all of your internal (web based) services you may manage. From the public standpoint, you might be very surprised at how much more traffic your website is getting despite what online analytic tools will tell you!

    Let’s Get Started

    First you’ll want to install the proper packages. You should hook up to my repository and the EPEL repository as well! The EPEL repository hosts AWStats too, but mine is a newer version. We need the EPEL repository for it’s GeoIP packages since they get updated more often there:

    # CentOS 7 users can connect to EPEL this way:
    rpm -Uhi https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
    
    # Similarly, you can hook up to my repository at https://nuxref.com
    # but here is a quick way of doing it (for CentOS/RedHat 7):
    rpm -Uhi https://repo.nuxref.com/centos/7/en/x86_64/custom/nuxref-release-1.0.0-4.el7.nuxref.noarch.rpm
    

    You should be good to go now; the following installs AWStats and a few extra tools to get the best out of it:

    # install awstats
    # install htmldoc too because it'll allow you to create a pdf
    # install geoip-geolite for the ability to track the IPs
    #        to countries
    # install perl-Geo-IP to look up the IP Addresses
    yum install awstats htmldoc geoip-geolite perl-Geo-IP
    
    

    AWStat In a Nutshell

    The steps below will require that you have set up the environment defined below. Obviously you’ll want to change these environment variables to suite your own needs:

    # First define our website as a variable.
    # We will use this value to track and store in an
    # organized structure.
    
    # Those who host other websites for people can
    # change this and virtually everything below will
    # and re-run everything to get stats for that too!
    WEBSITE=nuxref.com
    
    # AWSTATS Variable Data
    DATADIR=/var/lib/awstats/$WEBSITE
    
    

    Configuring AWStats: Step 1 of 3

    AWStats works from configuration files you create in /etc/awstats/. But it also needs a directory it can work within (we use /var/lib/awstats/). I’ve provided documentation around each line so you know what’s going on:

    # Make sure our environment variables are defined
    # WEBSITE and DATADIR
    
    # First we need to setup our DATADIR; this is where
    # all our statistics and generated data will be placed
    # into:
    [ ! -d $DATADIR/static ] && \
        mkdir -p $DATADIR/static
    ln -snf /usr/share/awstats/wwwroot/icon \
       $DATADIR/static/icon
    ln -snf /usr/share/awstats/wwwroot/cgi-bin \
       $DATADIR/static/cgi-bin
    
    # Create a configuration file using our website
    # based on the awws.model.conf example file that
    # ships with AWStats
    sed -e "s|localhost\.localdomain|$WEBSITE|g" \
    	/etc/awstats/awstats.model.conf > \
    		/etc/awstats/awstats.$WEBSITE.conf
    
    ########################################
    # Now update our new configuration
    ########################################
    # Update the LogFile with our access.log file we'll
    # reference. This path doesn't exist yet but
    # we'll be creating it soon enough; leave this entry
    # untouched (don't change it to your real log path!):
    sed -i -e "s|^\(LogFile\)=.*$|\1=\"$DATADIR/access.log\"|g" \
       /etc/awstats/awstats.$WEBSITE.conf
    
    # Disable DNS (for speed mostly)
    sed -i -e "s|^\(DNSLookup\)=.*$|\1=0|g" \
       /etc/awstats/awstats.$WEBSITE.conf
    
    # For PDF Generation we need to update the relative
    # paths for the icons.
    sed -i -e "s|^\(DirIcons\)=.*$|\1=\"icon\"|g" \
       /etc/awstats/awstats.$WEBSITE.conf
    sed -i -e "s|^\(DirCgi\)=.*$|\1=\"cgi-bin\"|g" \
       /etc/awstats/awstats.$WEBSITE.conf
    
    

    Optionally Configuring GeoIP Updates

    The geolite data fetches us a great set of (meta) data we can reference when looking up IP Addresses (of people who visited our site) and determining what part of the world they came from. This information is fantastic when putting together statistics and web page traffic like AWStats does.

    First we want to configure AWStats to use the GEO IP Plugin:

    # Now configure our GEOIP Setup
    sed -i -e '/^LoadPlugin=.*/d' /etc/awstats/awstats.$WEBSITE.conf
    cat << _EOF >> /etc/awstats/awstats.$WEBSITE.conf
    LoadPlugin="geoip GEOIP_STANDARD /usr/share/GeoIP/GeoIP.dat"
    LoadPlugin="geoip_city_maxmind GEOIP_STANDARD /usr/share/GeoIP/GeoIPCity.dat"
    _EOF
    

    Next we want to set up our GEO IP to update itself with the latest meta data for us automatically (so we don’t have to worry about it):

    # downloads all of the latest GEO IP content to
    # /usr/share/GeoIP with this simple command:
    geoipupdate
    
    # This IP information changes often; so the next
    # thing you want to do is create a cronjob to have
    # this tool fetch regular updates automatically for
    # us to keep the GEO IP Content fresh and up to date!
    cat << _EOF > /etc/cron.d/geoipdate
    0 12 * * 3 root /usr/bin/geoipupdate &>/dev/null
    _EOF
    

    Apache Users: Step 2a of 3

    AWStats depends on the log files to build it’s statistics from, so it’s important we point it to the right directory. Apache logs have been pretty much standardized and AWStats just works with them. If your web page is being hosted through Apache then your log files are most likely being placed in /var/log/httpd. If you’re using NginX (and not Apache), you can skip over this section and to Step 2b of 3 instead.

    Make sure AWStats knows it’s dealing with Apache log files (make sure you’ve still got the $WEBSITE variable defined from above):

    # Make sure our environment variables are defined
    # WEBSITE and DATADIR
    ########################################
    # Apache Users Should run The Following
    ########################################
    # Now if you're logs are created from Apache you
    # need to run the following:
    # Log Format (Type 1 is for Apache)
    sed -i -e "s|^\(LogFormat\)=.*$|\1=1|g" \
       /etc/awstats/awstats.$WEBSITE.conf
    
    

    Now what we want to do is take all of the logs files associated with our website in /var/log/httpd and build one great big (sorted) log file we can get all of our statistics out of:

    # logresolvmerge.pl is a fantastic tool that ships with
    # awstats and merges (and sorts) all of our logs. We
    # place the output into our $DATADIR (which we declared
    # earlier):
    /usr/share/awstats/tools/logresolvemerge.pl \
       /var/log/httpd/access.log \
       /var/log/httpd/access.log-????????.gz \
        > $DATADIR/access.log
    
    

    Nginx Users

    NginX logs have a slightly different format then the Apache logs and therefore require a slightly different configuration to work. If your web page is being hosted through NginX then your log files are most likely being placed in /var/log/nginx. If you’re using Apache (and not NginX), then you can skip over this section as long as you’ve already done Step 2a of 3 instead.

    Make sure AWStats knows it’s dealing with NginX log files otherwise it won’t be able to interpret them. Also be sure to have your $WEBSITE variable defined:

    # Make sure our environment variables are defined
    # WEBSITE and DATADIR
    ########################################
    # NginX Users Should run The Following
    ########################################
    # If you're using NginX, you'll want to adjust
    # your awstat LogFormat entry as follows:
    sed -i -e "s|^\(LogFormat\)=.*$|\1=\"%host %other %logname %time1 %methodurl %code %bytesd %refererquot %uaquot\"|g" \
       /etc/awstats/awstats.$WEBSITE.conf
    

    Now we take all of the logs files associated with our website in /var/log/nginx and build one great big (sorted) log file we can get all of our statistics out of:

    # logresolvmerge.pl is a fantastic tool that ships with
    # awstats and merges (and sorts) all of our logs. We
    # place the output into our $DATADIR (which we declared
    # earlier):
    /usr/share/awstats/tools/logresolvemerge.pl \
       /var/log/nginx/access.log \
       /var/log/nginx/access.log-????????.gz \
        > $DATADIR/access.log
    

    Statistic Generation: Step 3 of 3

    At this point we have all the info we need

    # Make sure our environment variables are defined
    # WEBSITE and DATADIR
    ########################################
    # (Create) and/or Update our Stats
    ########################################
    /usr/share/awstats/wwwroot/cgi-bin/awstats.pl \
       -config=$WEBSITE
    
    # The following builds us a PDF file containing all
    # of our statistics in addition to a website we can
    # optionally host if we want.
    # The following would allow you to gather statistics for
    # a given year:
    #   /usr/share/awstats/tools/awstats_buildstaticpages.pl \
    #      -config=$WEBSITE -buildpdf \
    #      -month=all -year=$(date +'%Y') \
    #      -dir=$DATADIR/static \
    #      -buildpdf=/usr/bin/htmldoc
    
    # This will build statistics with all the information we have:
    /usr/share/awstats/tools/awstats_buildstaticpages.pl \
       -config=$WEBSITE -buildpdf \
       -dir=$DATADIR/static \
       -buildpdf=/usr/bin/htmldoc
    
    # - The main website will appear as:
    #      $DATADIR/static/awstats.$WEBSITE.html
    #    But this 'main' website links to several other websites
    #    that can also all be found in the $DATADIR/static
    #    directory
    # - The pdf file will appear as:
    #      $DATADIR/static/awstats.$WEBSITE.pdf
    

    Consider throwing the above into a script file and having it ran in a cron job!

    Hosting The Statistics

    This option is purely optional; but but here is some simple configurations you can use if you want to access these generated statistics from your browser.

    Note: I intentionally keep things simple in this section. AWStats can be configured so that you can update your statistics via it’s very own website (see AllowToUpdateStatsFromBrowser directive in the site configuration). However I don’t recommend this option and therefore do not document it below.

    NginX

    A simple NginX configuration might look like this:

    # Make sure our environment variables are defined
    # WEBSITE and DATADIR
    cat << _EOF > /etc/nginx/default.d/awstats.$WEBSITE.conf
       # Visit your statistics by browsing to:
       # if WEBSITE was equal nuxref.com, you'd visit the stats:
       # http://localhost/stats/nuxref.com/
       location /stats/$WEBSITE/ {
          alias   $DATADIR/$WEBSITE/static/;
          index  awstats.$WEBSITE.html;
    
          ## Set 1.2.3.4 to your own IP address and uncomment
          ## the entries below to 'only' allow yourself access to
          ## these stats:
          # allow 1.2.3.4/32;
          # deny all;
    
          location /stats/css/ {
              alias /usr/share/awstats/wwwroot/css/;
          }
    
          location /stats/icon/ {
              alias /usr/share/awstats/wwwroot/icon/;
          }
       }
    _EOF
    

    Don’t forget to reload NginX so it takes on your new configuration (and makes that statistics page visible):

    # Reload NginX
    systemctl reload nginx.service
    

    Apache

    # Make sure our environment variables are defined
    # WEBSITE and DATADIR
    cat << _EOF > /etc/httpd/conf.d/awstats.$WEBSITE.conf
       # Visit your statistics by browsing to:
       # if WEBSITE was equal nuxref.com, you'd visit the stats:
       # https://localhost/stats/nuxref.com/
       Alias /stats/$WEBSITE/ "$DATADIR/$WEBSITE/static/"
       <Directory "$DATADIR/$WEBSITE/static/">
          Options FollowSymLinks
          AllowOverride None
          Order allow,deny
          Allow from all
    
          ## Set 1.2.3.4 to your own IP address and uncomment
          ## the entries below to 'only' allow yourself access to
          ## these stats:
          # Order deny,allow
          # Deny from all
          # Allow from 1.2.3.4/255.255.255.255
       </Directory>
    _EOF
    

    Don’t forget to reload Apache so it takes on your new configuration (and makes that statistics page visible):

    # Reload Apache
    systemctl reload httpd.service
    

    Credit

    This blog took me a long time to put together and test! The repository hosting alone accommodates all my blog entries up to this date. I took the open source available to me and rebuilt it to make it an easier solution and decided to share it. If you like what you see and wish to copy and paste this HOWTO, please reference back to this blog post at the very least. It’s really all I ask.

    Sources