Introduction
AWStats is a great tool for gathering statistics about your website. It acquires everything it needs to know about your site strictly through your websites log files. AWStats is able to scan through these logs line by line and present them in a fantastic report. This report can really help you make strategic decisions going forward as well as spot any anomalies that might be taking place. The tool is smart enough to only scan newer log entries (from when it last ran) allowing you to run it again and again (as often as you want). Thus, once you set this tool up to run daily (or even hourly), you’ll have detailed statistics about your website you can call upon anytime.
AWStats collects information such as such as:
- Who is visiting your site.
- How many visitors you’re getting daily.
- Where are they’re coming from (did a site link to you?)
- Where is the visitor from (geographical location
- … and on and on
The presentation of these collected statistics can be either via a website (HTML), XML and/or as a PDF file. The PDF is especially useful since it combines all of the multiple HTML pages (as presented) into one great big report with a table of contents and hyperlinks throughout it! The PDF is also really easy to navigate and pass along to others who might also be interested.
Why Use AWStats over Google Analytics?
The number one reason is because AWStats is much (,much) more accurate! AWStats also just works without ‘any’ changes to your website (literally – none at all). Google Analytics however requires you to add a small piece of JavaScript to every web page you want to track. Every time this tiny bit of JavaScript code executes, it passes the information along to Google. The problem is… if that little snippet of JavaScript doesn’t execute, then Google doesn’t track that user (and you’ll never know) because it just won’t get reported.
It’s really easy to prevent this chunk of JavaScript from running too, you just have to have installed something like Ad-Blocker Plus, Disconnect and/or uBlock into your Web Browser (such as Firefox or Chrome). These plugins specifically block these tracking techniques and eliminate most (if not all) advertising the website might have too.
It doesn’t mean that online analytic tools (like Google Analytics) are not good; no, not at all! But it’s just important to understand that they can’t (and truly aren’t) reporting everything that’s going on with your website and the traffic generated from it.
Another point worth mentioning is that Google Analytics can not monitor and report statistics on traffic used by third party tools. Therefore you can’t use it to monitor any RESTful API services because the programs accessing it will never call these JavaScript snippets of code.
It’s worth pointing out now that if you use AWStats, you’ll have the full picture! You’ll be able to easily identify any anomalies and detect certain forms of malicious intent! You’ll be able to monitor all of your internal (web based) services you may manage. From the public standpoint, you might be very surprised at how much more traffic your website is getting despite what online analytic tools will tell you!
Let’s Get Started
First you’ll want to install the proper packages. You should hook up to my repository and the EPEL repository as well! The EPEL repository hosts AWStats too, but mine is a newer version. We need the EPEL repository for it’s GeoIP packages since they get updated more often there:
# CentOS 7 users can connect to EPEL this way: rpm -Uhi https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm # Similarly, you can hook up to my repository at https://nuxref.com # but here is a quick way of doing it (for CentOS/RedHat 7): rpm -Uhi https://repo.nuxref.com/centos/7/en/x86_64/custom/nuxref-release-1.0.0-4.el7.nuxref.noarch.rpm
You should be good to go now; the following installs AWStats and a few extra tools to get the best out of it:
# install awstats # install htmldoc too because it'll allow you to create a pdf # install geoip-geolite for the ability to track the IPs # to countries # install perl-Geo-IP to look up the IP Addresses yum install awstats htmldoc geoip-geolite perl-Geo-IP
AWStat In a Nutshell
The steps below will require that you have set up the environment defined below. Obviously you’ll want to change these environment variables to suite your own needs:
# First define our website as a variable. # We will use this value to track and store in an # organized structure. # Those who host other websites for people can # change this and virtually everything below will # and re-run everything to get stats for that too! WEBSITE=nuxref.com # AWSTATS Variable Data DATADIR=/var/lib/awstats/$WEBSITE
Configuring AWStats: Step 1 of 3
AWStats works from configuration files you create in /etc/awstats/. But it also needs a directory it can work within (we use /var/lib/awstats/). I’ve provided documentation around each line so you know what’s going on:
# Make sure our environment variables are defined # WEBSITE and DATADIR # First we need to setup our DATADIR; this is where # all our statistics and generated data will be placed # into: [ ! -d $DATADIR/static ] && \ mkdir -p $DATADIR/static ln -snf /usr/share/awstats/wwwroot/icon \ $DATADIR/static/icon ln -snf /usr/share/awstats/wwwroot/cgi-bin \ $DATADIR/static/cgi-bin # Create a configuration file using our website # based on the awws.model.conf example file that # ships with AWStats sed -e "s|localhost\.localdomain|$WEBSITE|g" \ /etc/awstats/awstats.model.conf > \ /etc/awstats/awstats.$WEBSITE.conf ######################################## # Now update our new configuration ######################################## # Update the LogFile with our access.log file we'll # reference. This path doesn't exist yet but # we'll be creating it soon enough; leave this entry # untouched (don't change it to your real log path!): sed -i -e "s|^\(LogFile\)=.*$|\1=\"$DATADIR/access.log\"|g" \ /etc/awstats/awstats.$WEBSITE.conf # Disable DNS (for speed mostly) sed -i -e "s|^\(DNSLookup\)=.*$|\1=0|g" \ /etc/awstats/awstats.$WEBSITE.conf # For PDF Generation we need to update the relative # paths for the icons. sed -i -e "s|^\(DirIcons\)=.*$|\1=\"icon\"|g" \ /etc/awstats/awstats.$WEBSITE.conf sed -i -e "s|^\(DirCgi\)=.*$|\1=\"cgi-bin\"|g" \ /etc/awstats/awstats.$WEBSITE.conf
Optionally Configuring GeoIP Updates
The geolite data fetches us a great set of (meta) data we can reference when looking up IP Addresses (of people who visited our site) and determining what part of the world they came from. This information is fantastic when putting together statistics and web page traffic like AWStats does.
First we want to configure AWStats to use the GEO IP Plugin:
# Now configure our GEOIP Setup sed -i -e '/^LoadPlugin=.*/d' /etc/awstats/awstats.$WEBSITE.conf cat << _EOF >> /etc/awstats/awstats.$WEBSITE.conf LoadPlugin="geoip GEOIP_STANDARD /usr/share/GeoIP/GeoIP.dat" LoadPlugin="geoip_city_maxmind GEOIP_STANDARD /usr/share/GeoIP/GeoIPCity.dat" _EOF
Next we want to set up our GEO IP to update itself with the latest meta data for us automatically (so we don’t have to worry about it):
# downloads all of the latest GEO IP content to # /usr/share/GeoIP with this simple command: geoipupdate # This IP information changes often; so the next # thing you want to do is create a cronjob to have # this tool fetch regular updates automatically for # us to keep the GEO IP Content fresh and up to date! cat << _EOF > /etc/cron.d/geoipdate 0 12 * * 3 root /usr/bin/geoipupdate &>/dev/null _EOF
Apache Users: Step 2a of 3
AWStats depends on the log files to build it’s statistics from, so it’s important we point it to the right directory. Apache logs have been pretty much standardized and AWStats just works with them. If your web page is being hosted through Apache then your log files are most likely being placed in /var/log/httpd. If you’re using NginX (and not Apache), you can skip over this section and to Step 2b of 3 instead.
Make sure AWStats knows it’s dealing with Apache log files (make sure you’ve still got the $WEBSITE variable defined from above):
# Make sure our environment variables are defined # WEBSITE and DATADIR ######################################## # Apache Users Should run The Following ######################################## # Now if you're logs are created from Apache you # need to run the following: # Log Format (Type 1 is for Apache) sed -i -e "s|^\(LogFormat\)=.*$|\1=1|g" \ /etc/awstats/awstats.$WEBSITE.conf
Now what we want to do is take all of the logs files associated with our website in /var/log/httpd and build one great big (sorted) log file we can get all of our statistics out of:
# logresolvmerge.pl is a fantastic tool that ships with # awstats and merges (and sorts) all of our logs. We # place the output into our $DATADIR (which we declared # earlier): /usr/share/awstats/tools/logresolvemerge.pl \ /var/log/httpd/access.log \ /var/log/httpd/access.log-????????.gz \ > $DATADIR/access.log
Nginx Users
NginX logs have a slightly different format then the Apache logs and therefore require a slightly different configuration to work. If your web page is being hosted through NginX then your log files are most likely being placed in /var/log/nginx. If you’re using Apache (and not NginX), then you can skip over this section as long as you’ve already done Step 2a of 3 instead.
Make sure AWStats knows it’s dealing with NginX log files otherwise it won’t be able to interpret them. Also be sure to have your $WEBSITE variable defined:
# Make sure our environment variables are defined # WEBSITE and DATADIR ######################################## # NginX Users Should run The Following ######################################## # If you're using NginX, you'll want to adjust # your awstat LogFormat entry as follows: sed -i -e "s|^\(LogFormat\)=.*$|\1=\"%host %other %logname %time1 %methodurl %code %bytesd %refererquot %uaquot\"|g" \ /etc/awstats/awstats.$WEBSITE.conf
Now we take all of the logs files associated with our website in /var/log/nginx and build one great big (sorted) log file we can get all of our statistics out of:
# logresolvmerge.pl is a fantastic tool that ships with # awstats and merges (and sorts) all of our logs. We # place the output into our $DATADIR (which we declared # earlier): /usr/share/awstats/tools/logresolvemerge.pl \ /var/log/nginx/access.log \ /var/log/nginx/access.log-????????.gz \ > $DATADIR/access.log
Statistic Generation: Step 3 of 3
At this point we have all the info we need
# Make sure our environment variables are defined # WEBSITE and DATADIR ######################################## # (Create) and/or Update our Stats ######################################## /usr/share/awstats/wwwroot/cgi-bin/awstats.pl \ -config=$WEBSITE # The following builds us a PDF file containing all # of our statistics in addition to a website we can # optionally host if we want. # The following would allow you to gather statistics for # a given year: # /usr/share/awstats/tools/awstats_buildstaticpages.pl \ # -config=$WEBSITE -buildpdf \ # -month=all -year=$(date +'%Y') \ # -dir=$DATADIR/static \ # -buildpdf=/usr/bin/htmldoc # This will build statistics with all the information we have: /usr/share/awstats/tools/awstats_buildstaticpages.pl \ -config=$WEBSITE -buildpdf \ -dir=$DATADIR/static \ -buildpdf=/usr/bin/htmldoc # - The main website will appear as: # $DATADIR/static/awstats.$WEBSITE.html # But this 'main' website links to several other websites # that can also all be found in the $DATADIR/static # directory # - The pdf file will appear as: # $DATADIR/static/awstats.$WEBSITE.pdf
Consider throwing the above into a script file and having it ran in a cron job!
Hosting The Statistics
This option is purely optional; but but here is some simple configurations you can use if you want to access these generated statistics from your browser.
Note: I intentionally keep things simple in this section. AWStats can be configured so that you can update your statistics via it’s very own website (see AllowToUpdateStatsFromBrowser directive in the site configuration). However I don’t recommend this option and therefore do not document it below.
NginX
A simple NginX configuration might look like this:
# Make sure our environment variables are defined # WEBSITE and DATADIR cat << _EOF > /etc/nginx/default.d/awstats.$WEBSITE.conf # Visit your statistics by browsing to: # if WEBSITE was equal nuxref.com, you'd visit the stats: # http://localhost/stats/nuxref.com/ location /stats/$WEBSITE/ { alias $DATADIR/$WEBSITE/static/; index awstats.$WEBSITE.html; ## Set 1.2.3.4 to your own IP address and uncomment ## the entries below to 'only' allow yourself access to ## these stats: # allow 1.2.3.4/32; # deny all; location /stats/css/ { alias /usr/share/awstats/wwwroot/css/; } location /stats/icon/ { alias /usr/share/awstats/wwwroot/icon/; } } _EOF
Don’t forget to reload NginX so it takes on your new configuration (and makes that statistics page visible):
# Reload NginX systemctl reload nginx.service
Apache
# Make sure our environment variables are defined # WEBSITE and DATADIR cat << _EOF > /etc/httpd/conf.d/awstats.$WEBSITE.conf # Visit your statistics by browsing to: # if WEBSITE was equal nuxref.com, you'd visit the stats: # https://localhost/stats/nuxref.com/ Alias /stats/$WEBSITE/ "$DATADIR/$WEBSITE/static/" <Directory "$DATADIR/$WEBSITE/static/"> Options FollowSymLinks AllowOverride None Order allow,deny Allow from all ## Set 1.2.3.4 to your own IP address and uncomment ## the entries below to 'only' allow yourself access to ## these stats: # Order deny,allow # Deny from all # Allow from 1.2.3.4/255.255.255.255 </Directory> _EOF
Don’t forget to reload Apache so it takes on your new configuration (and makes that statistics page visible):
# Reload Apache systemctl reload httpd.service
Credit
This blog took me a long time to put together and test! The repository hosting alone accommodates all my blog entries up to this date. I took the open source available to me and rebuilt it to make it an easier solution and decided to share it. If you like what you see and wish to copy and paste this HOWTO, please reference back to this blog post at the very least. Itβs really all I ask.
When I try to add your repo I am getting errors..
warning: /var/tmp/rpm-tmp.1eIw43: Header V4 RSA/SHA1 Signature, key ID efe82e3b: NOKEY
################################# [100%]
Updating / installing…
################################# [100%]
error: /etc/pki/rpm-gpg/RPM-GPG-KEY-nuxref-com: import read failed(2).
It sounds like your installing the wrong rpm for the wrong distribution. Here is a similar case (but with EPEL Repos).
What version/OS are you using?
This was with a VPS
centos 7.3.1611 x64
The machine was brand new, just booted, tried again tonight did the same thing.
You asked how the EPEL repo does, I used this instead of yours:
sudo yum install -y epel-release
But when I did try to add it via the command you listed, it stated it was already added,
I did manage to get everything else working with your repo though thanks to your great instructions.
Feel free to email me if you need additional details.
Sorry.. should have replied here, you can delete the other reply..
I was convinced it was your end, but now I’m convinced it is mine.
I pushed a small update, can you try the new RPM i posted?
rpm -Uhi http://repo.nuxref.com/centos/7/en/x86_64/custom/nuxref-release-1.0.0-4.el7.nuxref.noarch.rpm
You can also do it through yum if you want:
yum install http://repo.nuxref.com/centos/7/en/x86_64/custom/nuxref-release-1.0.0-4.el7.nuxref.noarch.rpm
Also… Thanks for bringing this to my attention! π
I started up another machine..
I installed it via the yum command and it seemed to install OK, but.. I ran:
yum update
and got:
http://repo.nuxref.com/centos/7/en/x86_64/custom/repodata/repomd.xml: [Errno 14] curl#6 – “Could not resolve host: repo.nuxref.com; Name or service not known”
/me kicks the internet
I then waited a few minutes… and it worked!
Thanks a bunch.. π
I was playing with a vps… centos 7.3.1611 x64 was the os.. the machine was brand new, just booted, tried again tonight did the same thing.
Just FYI.. awstats is now updated to version 7.6-1.el7 on EPEL repo
That’s great news honestly! I updated my copy to v7.6 too so that it will include all of my personal patches. But I do like hearing that EPEL is actively maintaining this package. I have contacted the package manager of it directly and will submit my subtle changes that fix a few things for us CentOS/RedHat users. Hopefully they will be accepted upstream and I won’t have to maintain a fork of it anymore! π
Also, I’m glad to hear you got your repository setup working again. Thanks for pointing out my issue! I’m a bit slow responding because I was out of town this past week without internet access.
Two things….
first, I’m not sure I follow the different sections of environment code. What file do these go into? If you’re referring to “/etc/awstats/awstats.MYSITE.COM.conf”, then I’m confused on the need for defining the $WEBSITE variable, since the conf file already has ‘SiteDomain’.
Second, someone stole your article.
http://www.techguru.my/web-server/awstats/awstats-nginx-log-and-nginx-server/
Looks like all the content is the same, even this:
# Those who host other websites for people can
# change this and virtually everything below will
# and re-run everything to get stats for that too!
WEBSITE=nuxref.com
The blog was kind of written to just set everything up for you by just copying and pasting everything you see in front of you and having it ‘just work’.
The environment variables are about the only thing in the configuration that will shift/change per user. As long as you set the environment variables for your own configuration. You can just copy and paste the next sections and be ready to go and monitor your stats!
Thanks for sharing the stolen content. Blogs are constantly stolen; i can’t really do much about it. I don’t make any money off of this blog which is why I don’t create as many entries as a I used to. No doubt that person who took my information isn’t making much money himself either. If you’re still having problems, just reply and let me know what specifically you’re getting caught up on. I’ll gladly try to help you out!
L2g, thank you so much for the guide! It’s rather hard to find “Nginx, on CentOS, with awstats, and display those awstats on the main domain as a virtual path and not a subdomain with nginx” on the Internet!
I offer an selinux policy to the masochists out there. I hope the formatting works:
$ cat awstats1.te
module awstats1 1.0;
require {
type httpd_t;
type awstats_var_lib_t;
class file { getattr open read };
}
#============= httpd_t ==============
allow httpd_t awstats_var_lib_t:file { open read getattr };
To apply an selinux policy from a .te file:
sudo checkmodule -M -m -o awstats1.mod awstats1.te && sudo semodule_package -o awstats1.pp -m awstats1.mod && sudo semodule -i awstats1.pp ;