Tag Archives: RPM

A Redhat Usenet Solution

Introduction

The two top programs in the Usenet world are NZBGet and SABnzbd. Both of these tools have their strengths and weaknesses. Both of them also have a strong community that chooses one over the other and have their own preferences why they did. But at the end of the day, both of them are amazing and well maintained by their developers and the community that follows them.

I’ve recently begun packaging these tools in addition to some of the plugins that go with them. My goal was to just bring awareness to those who use the CentOS/Fedora/RedHat world that their lives just got easier! At the time this blog was written CentOS 8, Fedora 31, Fedora 32, and Fedora 33 were the current distributions.

The Setup and Target Audience

This blog was centered around the Nuxref repositories I maintain. Thus, this is the cleanest CentOS/RedHat RPM based installation with very little effort required. Simply:

  1. Connect to my repository.
  2. Install your desired packages (explained below):
  3. Enjoy!

The Nuxref Repositories

Before you proceed with the rest of this blog; you’ll first want to get yourself set up with the Nuxref repositories. This is where all of the packages are that will allow you to proceed with one (or both) of the solutions below.

The easiest way to do this is to visit the repository website and follow the instructions to get connected.

CentOS 8 users will want to connect to EPEL if they haven’t already done so:

# Connect to EPEL:
sudo dnf install -y epel-release

NZBGet Usenet Solution


The NZBGet solution comes with the following:

Provides a SystemD service ready to go for those who want to tie it to their systems startup and shutdown.

# Install NZBGet from the Nuxref repository
sudo dnf install -y nzbget

# Add NZBGet to the startup of your system
sudo systemctl enable nzbget.service

# Start it up now if you like too!
sudo systemctl start nzbget.service

You’ll now be able to access the web page through your browser accessing:
http://localhost:6789.
Note: The default login/password is nzbget/tegbzn6789 when prompted.

Here is a breakdown of how the custom NZBGet package works:

  • A default configuration (/etc/nzbget.conf) is pre-prepared for you which
  • sets up the logging directory to be /var/log/nzbget/nzbget.log (with log rotation on).
  • All of the variable data (file processing, etc) will be located in /var/lib/nzbget/*. In fact this is a very important directory because unless you configure things differently, all downloaded content will appear in /var/lib/nzbget/downloads.

To grant users on your system access to the data available through NZBGet, simply just add them to the nzbget group:

# swap the USER with your username you want to add to the
# nzbget group:
sudo usermod -aG nzbget USER

You can optionally choose to just run nzbget -d using your general account to launch a personal instance of NZBGet.

NZBGet Notifications

NZBGet can keep you posted on what it’s doing by sending you emails when a download completes (or fails). But it’s not just emails it can use as a medium; it can be Gotify, Twitter, Telegram, Amazon SNS, Slack, etc.

To leverage the notification features, you just need to install the NZB-Notify Plugin:

# Install the notification plugin from the Nuxref repository
sudo dnf install -y nzbget-script-notify

# Reload NZBGet so it takes on the new configuration
sudo systemctl restart nzbget

To enable it, you simply need to need to access the Notifications tab from within the NZBGet Settings section.

NZBGet Notification Setup
  1. Select the SETTINGS entry at the top right.
  2. Select the NOTIFY entry on the left hand side
  3. Notifications basically have to be defined as Apprise URLs; you can learn to construct the one for the services you wish to use here.

Finally it is a good idea to ensure the Notifications are triggered to run at the proper times:

NZBGet Settings – NZB-Notify Ordering
  1. Select the SETTINGS entry at the top right (if you’re not there already still from the previous steps).
  2. Select the EXTNSION SCRIPTS entry on the left hand side
  3. Make sure the Notify.py is listed in both the Extensions and ScriptOrder section.

NZBGet Video Sorting

Another great plugin for NZBGet is one that allows all content downloaded to be automatically sorted/renamed into a very conventional format. The amazing tool that does this is VideoSort. I’ve also gone ahead and packaged this so it would easy to use in a CentOS/Fedora enviroment like everything else:

# Install the videosort plugin from the Nuxref repository
sudo dnf install -y nzbget-script-videosort

# VideoSort can be a bit confusing to setup the first time; so the best way I can
# make this incredibly easy for you is to pre-load you with some good default
# configuration.  Here is how you can do it:

# First stop NZBGet:
sudo systemctl stop nzbget

# Backup the configuration file (so you can restore things they way they
# were if you don't like my approach):
sudo cp /etc/nzbget.conf /etc/nzbget.conf.backup

# Now thrown in our default configuration
# (just copy and paste the below):
NZBGETCFG=/etc/nzbget.conf

# First tidy up any conflicting entries (don't worry
# you backed this file up remember, we can roll back if you're
# not happy after:
sudo sed -i '/Videosort/d' $NZBGETCFG
sudo sed -i '/Category[0-9]\+/d' $NZBGETCFG

# Now copy in our new settings:
( cat << _EOF
# Custom Categories
Category1.Name=movies
# (option <DestDir>) is used. In this case if the option <AppendCategoryDir>
Category1.DestDir=\${DestDir}/Movies
Category1.Unpack=yes
Category1.Extensions=VideoSort.py, Notify.py
Category1.Aliases=movies*
Category2.Name=tv
Category2.DestDir=\${DestDir}/TVShows
Category2.Unpack=yes
Category2.Extensions=VideoSort.py, Notify.py
Category2.Aliases=hdtv, tv*, s??e??

# VideoSort Script
VideoSort.py:MoviesDir=\${DestDir}/Movies
VideoSort.py:SeriesDir=\${DestDir}/TVShows
VideoSort.py:DatedDir=\${DestDir}/TVShows
VideoSort.py:OtherTvDir=\${DestDir}/TVShows
VideoSort.py:TvCategories=tv
VideoSort.py:VideoExtensions=.mkv,.avi,.divx,.xvid,.mov,.wmv,.mp4,.mpg,.mpeg,.vob,.iso
VideoSort.py:SatelliteExtensions=.srt,.sub,.idx,.subs.rar,.rar,.nfo,.jpg
VideoSort.py:MinSize=100
VideoSort.py:MoviesFormat=%.t.(%y)/%.t.%y.%qss.%qf.%qvc-%qrg
VideoSort.py:SeriesFormat=%s.n/S%0s/%s.n.S%0sE%0e.%e.n.%qss.%qf.%qvc-%qrg
VideoSort.py:EpisodeSeparator=E
VideoSort.py:SeriesYear=yes
VideoSort.py:DatedFormat=%s.n/%s.n-%e.n-%y-%0m-%0d
VideoSort.py:OtherTvFormat=%t
VideoSort.py:LowerWords=the,of,and,at,vs,a,an,but,nor,for,on,so,yet
VideoSort.py:UpperWords=III,II,IV
VideoSort.py:DNZBHeaders=yes
VideoSort.py:PreferNZBName=yes
VideoSort.py:Overwrite=no
VideoSort.py:Cleanup=yes
VideoSort.py:Preview=no
VideoSort.py:Verbose=no
_EOF
) | sudo tee -a $NZBGETCFG

# Restart our NZBGet instance so it takes on the new configuration
sudo systemctl start nzbget

Firewall Configuration

The package will provide you the files needed to set up the firewall and make NZBGet available to you from other stations by simply doing the following:

# Assuming your network is set up to the `home` zone, the following
# sets you up to expose TCP Port 6789 on your network
firewall-cmd --zone=home --add-service nzbget --permanent

# You can optionally expose your secure instance of NZBGet on TCP
# port 6791 as well if you want
firewall-cmd --zone=home --add-service nzbget-secure --permanent

# Now reload your firewall to take on the new change:
firewall-cmd --reload

SABnzbd Usenet Solution

You can start it up with the command:

# Install SABnzbd from the Nuxref repository
sudo dnf install -y sabnzbd

# Add SABnzbd to the startup of your system
sudo systemctl enable sabnzbd.service

# Start it up now if you like too!
sudo systemctl start sabnzbd.service

You’ll now be able to access the web page through your browser accessing:
http://localhost:9080.

Here is a breakdown of how the custom SABnzbd package works:

  • All of your log files will show up in /var/log/sabnzbd/sabnzbd.log
  • All of configuration will get written to /etc/sabnzbd/sabnzbd.conf
  • All of the variable data (file processing, etc) will be located in /var/lib/sabnzbd/*. In fact this is a very important directory because unless you configure things differently, all downloaded content will appear in /var/lib/sabnzbd/complete.

To grant users on your system access to the data available through SABnzbd, simply just add them to the sabnzbd group:

# swap the USER with your username you want to add to the
# sabnzbd group:
sudo usermod -aG sabnzbd USER

Firewall Configuration

The package will provide you the files needed to set up the firewall and make SABnzbd available to you from other stations by simply doing the following:

# Assuming your network is set up to the `home` zone, the following
# sets you up to expose TCP Port 9080 on your network
firewall-cmd --zone=home --add-service sabnzbd --permanent

# You can optionally expose your secure instance of SABnzbd on TCP
# port 9090 as well if you want
firewall-cmd --zone=home --add-service sabnzbd-secure --permanent

# Now reload your firewall to take on the new change:
firewall-cmd --reload

SABnzbd Notifications

SABnzbd can keep you posted on what it’s doing by sending you emails when a download completes (or fails). But it’s not just emails it can use as a medium; it can be Gotify, Twitter, Telegram, Amazon SNS, Slack, etc.

To leverage the notification features, you just need to install the NZB-Notify Plugin:

# Install the notification plugin from the Nuxref repository
sudo dnf install -y sabnzbd-script-notify

To enable it, you simply need to need to access the Notifications tab from within the SABnzbd Configuration section.

SABnzbd Notification Setup
SABnzbd Notification Setup
  1. Select the Enable notification script checkbox
  2. Select the Notify.py script from the dropdown list next to the Script category
  3. Next to the Parameter category, you must specify the URL(s) identifying which service(s) you want to notify.Depending on what you want plan on alerting, the URL(s) you specify in the Parameter field will vary. You can get a better understanding of the URL options supported here.

NGinX Frontend

Regardless of what downloader you chose to setup from the instructions above, you should NEVER directly expose SABnzbd or NZBGet to the internet… EVER! ALWAYS uses some sort of proxy to bridge the internet from systems you run on your local network. NGinX (pronounced Engine-X) can solve this very thing for you. NginX is much more actively maintained and focuses entirely protecting your servers while allowing them to safely host their backend HTTP(S) solutions.

# Install NGinX and httpd-tools (if they're not there already)
sudo dnf install nginx httpd-tools

# Add nginx to the startup of your system
sudo systemctl enable nginx.service

# Start it up now if you like too!
sudo systemctl start nginx.service

# Expose port 80 and port 443
sudo firewall-cmd --zone=home --add-service http --permanent
sudo firewall-cmd --zone=home --add-service https --permanent

# If you're installing NginX on the same server that you've installed
# SABnzbd or NGinX on, make sure you no longer expose the ports associated
# with them anymore:
sudo firewall-cmd --zone=home --remove-service nzbget --permanent
sudo firewall-cmd --zone=home --remove-service nzbget-secure --permanent
sudo firewall-cmd --zone=home --remove-service sabnzbd --permanent
sudo firewall-cmd --zone=home --remove-service sabnzbd-secure --permanent

# Now reload your firewall to take on the new change:
firewall-cmd --reload

Now add the following into your /etc/nginx/default.d/usenet.conf file:

   # NZBGet wrapper
   location /nzbget/ {
      proxy_pass   http://localhost:6789;
      proxy_cache  off;

      # Simple Security
      auth_basic            "Restricted Usenet Area";
      auth_basic_user_file  usenet.htpasswd;
   }

   # SABnzbd wrapper
   location /sabnzbd/ {
      proxy_pass   http://localhost:9080;
      proxy_cache  off;

      # Simple Security
      auth_basic            "Restricted Usenet Area";
      auth_basic_user_file  usenet.htpasswd;
   }

Now let’s set up a simple password file we can use to secure our content.

# Generate a password for the user 'usenet'.  You might want to
# switch this for your own name if you like:
sudo htpasswd -c /etc/nginx/usenet.htpasswd usenet

# you will be prompted to enter a password to associate with your
# new user.  Go ahead and provide one! :)

# You can generate more account if you want too; just remove the
# -c for future calls. e.g:
# sudo htpasswd  /etc/nginx/usenet.htpasswd user2

# It wouldn't hurt to just enhance the security associated with our
# new password file (for prying eyes):
chown root.nginx /etc/nginx/usenet.htpasswd
chmod 640 /etc/nginx/usenet.htpasswd

# You're almost done!
# Make sure your configuration won't break anything
nginx -t -c /etc/nginx/nginx.conf

# If the above passed okay, then it's safe to make your configuration
# live.
sudo systemctl reload nginx

Credit

Please note that this information took me several days to put together and test thoroughly. There was a tremendous effort put in place to make all versions of these programs compatible with all recent RPM based Linux distributions. I may not blog often; but I want to re-assure the stability and testing I put into everything I intend share.

If you like what you see and wish to copy and paste this HOWTO, please reference back to this blog post at the very least. It’s really all I ask.

Sources

Pan: A Useful NewsReader for Linux

Introduction

PAN is a newsreader that has been around for ages. It allows you to sift through the massive clutter that Usenet has become through its really fast interface loaded with tons of features!

It’s development died off way back in 2012, but recently it’s development has picked right back up again. Not only is this product feature rich and open source, but it’s written purely in C++ which makes it incredibly light weight (thus very, very fast). Some of the subtle product enhancements this product has seen in the past few months make it worthy to be in the spot light again.

So What Can It Do?

  • Header Caching: Tell it the group(s) you want and how much of it you want to see and it will download the headers it retrieves to a local cache file. This is awesome because now you can sift through this content offline.
    Cache Headers
    Cache Headers
  • Header Scoring: You can flag key aspects of articles with a score. By default every header retrieved has a score of zero (0) unless you start dabbling in this area.

    Anything that scores less than (or equal to) -9999 can be configured to not list itself at all. Some well set scores can greatly clean up your ability to locate content in groups.

    You can score content higher and/or lower based on the posts author, subject, size, age, etc. You can even apply scoring through regular expressions too!

    Scoring is very powerful when used properly! I’ll talk about it again a bit later in this blog once you’ve gotten set up. But if we were to apply scoring to the previous screenshot (above), it might look like this (all garbage cleaned up and content prioritized with color coding too):

    Header Scoring
    Header Scoring
  • Multiple Server Support: Got a block account? No problem, you can add it as a secondary server and only pull from it if the Primary one is unavailable.
  • NZB-File Support: The treasure maps of Usenet can be loaded into Pan too and downloaded through it. True automation of these come through systems like NZBGet and SABnzbd, but it’s still worth knowing that not only is this a newsreader, but it can pass as a downloader as well!
  • Concurrent Connections: Like any great browser/downloader of any system; files are retrieve concurrently. This means that you can just keep browsing and tagging content of interest seamlessly without interruptions.
  • Header Compression Support: One of the new enhancements surfaced with the new development of this project. This makes a world of difference when retrieving hundreds of thousands of headers from a Usenet group. Enabling this feature along (if your Usenet provider supports it) will greatly reduce wait times!

Pan’s Disabled Features

The features page on PANs website explains about a parent company (called ChimPanXi) that tries to sell this free product with added functionality. I guess the deal they have with the developers is to just disable a few features so that they can be re-enabled them the paid version (purely speculation)?

But since the (Pan) code is open source, the options are right there in front of us but just disabled. Quite honestly… of all this disabled functionality, only one is truly worth pointing out: Pan restricts you to just 4 allowable concurrent connections to your Usenet provider at a time. Here is a small patch I created which increases this number to 99. The build I provide in this blog already has this patch applied. Here are the rest of the missing features (with some of my comments as well); maybe some might see value in the others?

Pans Missing Features
Pans Missing Features

The Goods

For those hooked up to my repository are already set, just type the following:

# install the new version of Pan
yum install pan --enablerepo=nuxref

You can also reference this table too for direct links:

Package Download Description
pan el7.rpm, fc22.rpm, fc23.rpm, fc24.rpm, fc25.rpm The Newsreader: This is the program that this blog focuses on.

Note: The source rpm can be obtained here which builds everything you see in the table above. It’s not required for the application to run, but might be useful for developers or those who want to inspect how I put the package together.

It’s also worth noting (again) that this build includes a small patch to increase the maximum allowable number of concurrent connections from 4 to 99.

Securing Your Connection

There is very little security built into Pan from a connection point of view. What little security is (normally) in place is built using GnuTLS. GnuTLS has a history of not keeping up with the security exploits and vulnerabilities that surface with encryption libraries. It doesn’t make it unsafe; it just doesn’t make it as reliable as it’s competition (OpenSSL and Crypto). For this reason the packages I provide are intentionally not built against it (GnuTLS).

It’s really not a problem at the end of the day because there are other ways of securing this connection (properly). The way I use (and recommend) is through Stunnel.

Stunnel allows you to take an unencrypted input (from Pan) and connect it to a secure connected one (at your Usenet provider). The best thing about stunnel is that it links to your (OpenSSL) shared system libraries libssl.so and libcrypto.so which are actively maintained and patched! Basically what I’m saying is by attaching Pan to Stunnel: you get the feature rich usage of Pan and the ongoing (reliable) security of OpenSSL.

The following will get you set up with stunnel; you’ll want to be root before running the command below:

# Install stunnel (if it's not installed already)
# you'll need to be connected to either EPEL or NuxRef for this
# to work:
yum install stunnel

You can also reference this table too for direct links:

Package Download Description
stunnel el7.rpm, fc22.rpm, fc23.rpm, fc24.rpm, fc25.rpm Secure Tunnel: for data encryption.

Note: This RPM is not required by PAN to run correctly. It does however offer you a safer and more secure method of encrypting your communication to (and from) your NNTP Server.
# You must have root permissions when setting up
# stunnel

# Create relay bound to local server only (semi-colons are for
# comments):
cat << _EOF > /etc/stunnel/stunnel.conf
; Use it for client mode
; This is the pass through mode you need to encrypted
; your NNTP traffic:
client = yes
 
[nntp]
;
; --- IN ---
;
; local port to listen on (on this PC)
; You will configure PAN to connect here:
accept = 127.0.0.1:119

;
; --- OUT ---
;
; The Remote Usenet Server's (encrypted) connection to use:
; In this example, I'm just pointing to Astraweb, but you
; can provide any Usenet server you wish here. Just be sure
; to point it to their secure transport point!
connect = ssl.astraweb.com:563
_EOF

# This line below is useless, but it allows you revisit this blog
# entry and continue and copy and paste these instructions at a later
# time. The line removes any previous entries set to prevent the
# creation of duplicate entries  in your startup file at another time
# It's harmless to run at any point:
sed -i -e '/bin\/stunnel/d' /etc/rc.d/rc.local

# Configure stunnel to start after each boot
echo "# Start /usr/bin/stunnel on boot each time:" >> /etc/rc.d/rc.local
echo "/usr/bin/stunnel" >> /etc/rc.d/rc.local

# By default stunnel is configured to read 
# it's configuration from /etc/stunnel/stunnel.conf
# on startup:
stunnel

The next step is to update your PAN server configuration to point to your local server (localhost or 127.0.0.1) instead of the remote one you’re accessing. Make sure to set the port to 119 too like so:

Stunnel Pan Configuration
Stunnel Pan Configuration

You’ll provide the same username and password you would have otherwise provided to your Usenet provider.

The end result is a secure connection between you and your Usenet provider like so:
Pan Setup With Stunnel

Scoring

Scoring articles can greatly ease your life when looking through all of the headers in front of you; it’s great for:

  • Eliminating SPAM
  • Filtering out potential malicious content (such as Trojans and Viruses)
  • Increasing the visibility of items of interest
  • Locating Authors of interest with ease

All scores can be optionally associated with a time limit too. When the limit expires, so does the score. This is useful when you only want to temporarily filter content. Otherwise the permanent scores will make up most of your configuration. To add a score, simply click Articles > Add a Scoring Rule…

Add Scoring Rule
Add a Scoring Rule

Here is an example of a rule you might add; this one greatly reduces the score of all entries that have potentially dangerous file extensions in the subject line:

Block Potentially Malicious Content
Block Potentially Malicious Content

Pan’s built in filter field allows you to sift through all of the articles you found with keywords. Pairing this functionality with the scoring one really shows off the power of Pan.

All created scores are kept in ~/.pan2/Scores so don’t worry if you mess one up. You can just as easily open this file and fix it. Any manual changes to this file will however require you to exit out of Pan (if it’s open) and restart it.

Here is just a few entries of what you might have in your Score file:

%BOS
% Greatly reduce score of potentially malicious content
[alt.bin*]
Score:: -9999
Subject: .*\.(exe|bat|vbs|cpl|msi|scr|vb(script)?|ws(f|h))[^A-Za-z0-9].*
%EOS

%BOS
% Moderately increase the score of compressed content
[alt.bin*]
Score:: 2500
Subject: .*\.(z(ip|[0-9]{2})|r(ar|[0-9]{2})|7z|iso)[^A-Za-z0-9]([0-9]{3}[^A-Za-z0-9])?.*
%EOS

%BOS
% Very slightly decrease the content of PAR content
% This allows it to not quite have the same spot light as
% the item it matches up against. If it were a compressed file
% it would already have +2500 from the previous score entry
% identified above.  These will just sit at +2400 instead.
[alt.bin*]
Score:: -100
Subject: .*(\.vol[0-9]+\+[0-9]+)?\.(par2|sfv)[^A-Za-z0-9].*
%EOS

%BOS
% Very slightly increase the score of NZB-Files
[alt.bin*]
Score:: 250
Subject: .*\.(nzb)[^A-Za-z0-9].*
%EOS 

%BOS
% Mildly drop the score of cross-posted content
[alt.bin*]
Score:: -750
Xref: (.*:){2} % cross-posted to 2 or more groups 
%EOS

Wrapping It Up

I’m certainly not asking anyone to change from their existing system if it works for them. What I am pointing out though is that Pan is completely free, it’s open source and the features it offers are comparable (if not better) than all of it’s competition. Although it works great on Linux, it also works on many other platforms as well such as Microsoft and Apple.

It might not have a beautiful interface, but it wasn’t built to fill your systems memory with bloated eye candy. It was built to be fast and effective… and truly, it really is.

The newer versions coming out are really great! If you haven’t given it a try since it’s dated ones, you really should! If you’re interested in seeing how Usenet is structured, than this is also a great tool to learn with. If you run an indexer (such as newznab or the many forks of it) you can practice your regular expressions (regexs) using Pan. For an Indexer Admin, this tool is especially great in debugging your regexs!

Credit

This blog took me a very long time to put together and test! The repository hosting alone accommodates all my blog entries up to this date. All of the custom packaging described here was done by me personally. I took the open source available to me and rebuilt it to make it an easier solution and decided to share it. If you like what you see and wish to copy and paste this HOWTO, please reference back to this blog post at the very least. It’s really all I ask.

Sources

AWStats Setup on CentOS

Introduction

AWStats is a great tool for gathering statistics about your website. It acquires everything it needs to know about your site strictly through your websites log files. AWStats is able to scan through these logs line by line and present them in a fantastic report. This report can really help you make strategic decisions going forward as well as spot any anomalies that might be taking place. The tool is smart enough to only scan newer log entries (from when it last ran) allowing you to run it again and again (as often as you want). Thus, once you set this tool up to run daily (or even hourly), you’ll have detailed statistics about your website you can call upon anytime.

AWStats collects information such as such as:AWStats Report

  • Who is visiting your site.
  • How many visitors you’re getting daily.
  • Where are they’re coming from (did a site link to you?)
  • Where is the visitor from (geographical location
  • … and on and on
  • The presentation of these collected statistics can be either via a website (HTML), XML and/or as a PDF file. The PDF is especially useful since it combines all of the multiple HTML pages (as presented) into one great big report with a table of contents and hyperlinks throughout it! The PDF is also really easy to navigate and pass along to others who might also be interested.

    Why Use AWStats over Google Analytics?

    Google Analytics Inaccuracy
    Google Analytics Inaccuracy
    The number one reason is because AWStats is much (,much) more accurate! AWStats also just works without ‘any’ changes to your website (literally – none at all). Google Analytics however requires you to add a small piece of JavaScript to every web page you want to track. Every time this tiny bit of JavaScript code executes, it passes the information along to Google. The problem is… if that little snippet of JavaScript doesn’t execute, then Google doesn’t track that user (and you’ll never know) because it just won’t get reported.

    It’s really easy to prevent this chunk of JavaScript from running too, you just have to have installed something like Ad-Blocker Plus, Disconnect and/or uBlock into your Web Browser (such as Firefox or Chrome). These plugins specifically block these tracking techniques and eliminate most (if not all) advertising the website might have too.

    It doesn’t mean that online analytic tools (like Google Analytics) are not good; no, not at all! But it’s just important to understand that they can’t (and truly aren’t) reporting everything that’s going on with your website and the traffic generated from it.

    Another point worth mentioning is that Google Analytics can not monitor and report statistics on traffic used by third party tools. Therefore you can’t use it to monitor any RESTful API services because the programs accessing it will never call these JavaScript snippets of code.

    It’s worth pointing out now that if you use AWStats, you’ll have the full picture! You’ll be able to easily identify any anomalies and detect certain forms of malicious intent! You’ll be able to monitor all of your internal (web based) services you may manage. From the public standpoint, you might be very surprised at how much more traffic your website is getting despite what online analytic tools will tell you!

    Let’s Get Started

    First you’ll want to install the proper packages. You should hook up to my repository and the EPEL repository as well! The EPEL repository hosts AWStats too, but mine is a newer version. We need the EPEL repository for it’s GeoIP packages since they get updated more often there:

    # CentOS 7 users can connect to EPEL this way:
    rpm -Uhi https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
    
    # Similarly, you can hook up to my repository at https://nuxref.com
    # but here is a quick way of doing it (for CentOS/RedHat 7):
    rpm -Uhi https://repo.nuxref.com/centos/7/en/x86_64/custom/nuxref-release-1.0.0-4.el7.nuxref.noarch.rpm
    

    You should be good to go now; the following installs AWStats and a few extra tools to get the best out of it:

    # install awstats
    # install htmldoc too because it'll allow you to create a pdf
    # install geoip-geolite for the ability to track the IPs
    #        to countries
    # install perl-Geo-IP to look up the IP Addresses
    yum install awstats htmldoc geoip-geolite perl-Geo-IP
    
    

    AWStat In a Nutshell

    The steps below will require that you have set up the environment defined below. Obviously you’ll want to change these environment variables to suite your own needs:

    # First define our website as a variable.
    # We will use this value to track and store in an
    # organized structure.
    
    # Those who host other websites for people can
    # change this and virtually everything below will
    # and re-run everything to get stats for that too!
    WEBSITE=nuxref.com
    
    # AWSTATS Variable Data
    DATADIR=/var/lib/awstats/$WEBSITE
    
    

    Configuring AWStats: Step 1 of 3

    AWStats works from configuration files you create in /etc/awstats/. But it also needs a directory it can work within (we use /var/lib/awstats/). I’ve provided documentation around each line so you know what’s going on:

    # Make sure our environment variables are defined
    # WEBSITE and DATADIR
    
    # First we need to setup our DATADIR; this is where
    # all our statistics and generated data will be placed
    # into:
    [ ! -d $DATADIR/static ] && \
        mkdir -p $DATADIR/static
    ln -snf /usr/share/awstats/wwwroot/icon \
       $DATADIR/static/icon
    ln -snf /usr/share/awstats/wwwroot/cgi-bin \
       $DATADIR/static/cgi-bin
    
    # Create a configuration file using our website
    # based on the awws.model.conf example file that
    # ships with AWStats
    sed -e "s|localhost\.localdomain|$WEBSITE|g" \
    	/etc/awstats/awstats.model.conf > \
    		/etc/awstats/awstats.$WEBSITE.conf
    
    ########################################
    # Now update our new configuration
    ########################################
    # Update the LogFile with our access.log file we'll
    # reference. This path doesn't exist yet but
    # we'll be creating it soon enough; leave this entry
    # untouched (don't change it to your real log path!):
    sed -i -e "s|^\(LogFile\)=.*$|\1=\"$DATADIR/access.log\"|g" \
       /etc/awstats/awstats.$WEBSITE.conf
    
    # Disable DNS (for speed mostly)
    sed -i -e "s|^\(DNSLookup\)=.*$|\1=0|g" \
       /etc/awstats/awstats.$WEBSITE.conf
    
    # For PDF Generation we need to update the relative
    # paths for the icons.
    sed -i -e "s|^\(DirIcons\)=.*$|\1=\"icon\"|g" \
       /etc/awstats/awstats.$WEBSITE.conf
    sed -i -e "s|^\(DirCgi\)=.*$|\1=\"cgi-bin\"|g" \
       /etc/awstats/awstats.$WEBSITE.conf
    
    

    Optionally Configuring GeoIP Updates

    The geolite data fetches us a great set of (meta) data we can reference when looking up IP Addresses (of people who visited our site) and determining what part of the world they came from. This information is fantastic when putting together statistics and web page traffic like AWStats does.

    First we want to configure AWStats to use the GEO IP Plugin:

    # Now configure our GEOIP Setup
    sed -i -e '/^LoadPlugin=.*/d' /etc/awstats/awstats.$WEBSITE.conf
    cat << _EOF >> /etc/awstats/awstats.$WEBSITE.conf
    LoadPlugin="geoip GEOIP_STANDARD /usr/share/GeoIP/GeoIP.dat"
    LoadPlugin="geoip_city_maxmind GEOIP_STANDARD /usr/share/GeoIP/GeoIPCity.dat"
    _EOF
    

    Next we want to set up our GEO IP to update itself with the latest meta data for us automatically (so we don’t have to worry about it):

    # downloads all of the latest GEO IP content to
    # /usr/share/GeoIP with this simple command:
    geoipupdate
    
    # This IP information changes often; so the next
    # thing you want to do is create a cronjob to have
    # this tool fetch regular updates automatically for
    # us to keep the GEO IP Content fresh and up to date!
    cat << _EOF > /etc/cron.d/geoipdate
    0 12 * * 3 root /usr/bin/geoipupdate &>/dev/null
    _EOF
    

    Apache Users: Step 2a of 3

    AWStats depends on the log files to build it’s statistics from, so it’s important we point it to the right directory. Apache logs have been pretty much standardized and AWStats just works with them. If your web page is being hosted through Apache then your log files are most likely being placed in /var/log/httpd. If you’re using NginX (and not Apache), you can skip over this section and to Step 2b of 3 instead.

    Make sure AWStats knows it’s dealing with Apache log files (make sure you’ve still got the $WEBSITE variable defined from above):

    # Make sure our environment variables are defined
    # WEBSITE and DATADIR
    ########################################
    # Apache Users Should run The Following
    ########################################
    # Now if you're logs are created from Apache you
    # need to run the following:
    # Log Format (Type 1 is for Apache)
    sed -i -e "s|^\(LogFormat\)=.*$|\1=1|g" \
       /etc/awstats/awstats.$WEBSITE.conf
    
    

    Now what we want to do is take all of the logs files associated with our website in /var/log/httpd and build one great big (sorted) log file we can get all of our statistics out of:

    # logresolvmerge.pl is a fantastic tool that ships with
    # awstats and merges (and sorts) all of our logs. We
    # place the output into our $DATADIR (which we declared
    # earlier):
    /usr/share/awstats/tools/logresolvemerge.pl \
       /var/log/httpd/access.log \
       /var/log/httpd/access.log-????????.gz \
        > $DATADIR/access.log
    
    

    Nginx Users

    NginX logs have a slightly different format then the Apache logs and therefore require a slightly different configuration to work. If your web page is being hosted through NginX then your log files are most likely being placed in /var/log/nginx. If you’re using Apache (and not NginX), then you can skip over this section as long as you’ve already done Step 2a of 3 instead.

    Make sure AWStats knows it’s dealing with NginX log files otherwise it won’t be able to interpret them. Also be sure to have your $WEBSITE variable defined:

    # Make sure our environment variables are defined
    # WEBSITE and DATADIR
    ########################################
    # NginX Users Should run The Following
    ########################################
    # If you're using NginX, you'll want to adjust
    # your awstat LogFormat entry as follows:
    sed -i -e "s|^\(LogFormat\)=.*$|\1=\"%host %other %logname %time1 %methodurl %code %bytesd %refererquot %uaquot\"|g" \
       /etc/awstats/awstats.$WEBSITE.conf
    

    Now we take all of the logs files associated with our website in /var/log/nginx and build one great big (sorted) log file we can get all of our statistics out of:

    # logresolvmerge.pl is a fantastic tool that ships with
    # awstats and merges (and sorts) all of our logs. We
    # place the output into our $DATADIR (which we declared
    # earlier):
    /usr/share/awstats/tools/logresolvemerge.pl \
       /var/log/nginx/access.log \
       /var/log/nginx/access.log-????????.gz \
        > $DATADIR/access.log
    

    Statistic Generation: Step 3 of 3

    At this point we have all the info we need

    # Make sure our environment variables are defined
    # WEBSITE and DATADIR
    ########################################
    # (Create) and/or Update our Stats
    ########################################
    /usr/share/awstats/wwwroot/cgi-bin/awstats.pl \
       -config=$WEBSITE
    
    # The following builds us a PDF file containing all
    # of our statistics in addition to a website we can
    # optionally host if we want.
    # The following would allow you to gather statistics for
    # a given year:
    #   /usr/share/awstats/tools/awstats_buildstaticpages.pl \
    #      -config=$WEBSITE -buildpdf \
    #      -month=all -year=$(date +'%Y') \
    #      -dir=$DATADIR/static \
    #      -buildpdf=/usr/bin/htmldoc
    
    # This will build statistics with all the information we have:
    /usr/share/awstats/tools/awstats_buildstaticpages.pl \
       -config=$WEBSITE -buildpdf \
       -dir=$DATADIR/static \
       -buildpdf=/usr/bin/htmldoc
    
    # - The main website will appear as:
    #      $DATADIR/static/awstats.$WEBSITE.html
    #    But this 'main' website links to several other websites
    #    that can also all be found in the $DATADIR/static
    #    directory
    # - The pdf file will appear as:
    #      $DATADIR/static/awstats.$WEBSITE.pdf
    

    Consider throwing the above into a script file and having it ran in a cron job!

    Hosting The Statistics

    This option is purely optional; but but here is some simple configurations you can use if you want to access these generated statistics from your browser.

    Note: I intentionally keep things simple in this section. AWStats can be configured so that you can update your statistics via it’s very own website (see AllowToUpdateStatsFromBrowser directive in the site configuration). However I don’t recommend this option and therefore do not document it below.

    NginX

    A simple NginX configuration might look like this:

    # Make sure our environment variables are defined
    # WEBSITE and DATADIR
    cat << _EOF > /etc/nginx/default.d/awstats.$WEBSITE.conf
       # Visit your statistics by browsing to:
       # if WEBSITE was equal nuxref.com, you'd visit the stats:
       # http://localhost/stats/nuxref.com/
       location /stats/$WEBSITE/ {
          alias   $DATADIR/$WEBSITE/static/;
          index  awstats.$WEBSITE.html;
    
          ## Set 1.2.3.4 to your own IP address and uncomment
          ## the entries below to 'only' allow yourself access to
          ## these stats:
          # allow 1.2.3.4/32;
          # deny all;
    
          location /stats/css/ {
              alias /usr/share/awstats/wwwroot/css/;
          }
    
          location /stats/icon/ {
              alias /usr/share/awstats/wwwroot/icon/;
          }
       }
    _EOF
    

    Don’t forget to reload NginX so it takes on your new configuration (and makes that statistics page visible):

    # Reload NginX
    systemctl reload nginx.service
    

    Apache

    # Make sure our environment variables are defined
    # WEBSITE and DATADIR
    cat << _EOF > /etc/httpd/conf.d/awstats.$WEBSITE.conf
       # Visit your statistics by browsing to:
       # if WEBSITE was equal nuxref.com, you'd visit the stats:
       # https://localhost/stats/nuxref.com/
       Alias /stats/$WEBSITE/ "$DATADIR/$WEBSITE/static/"
       <Directory "$DATADIR/$WEBSITE/static/">
          Options FollowSymLinks
          AllowOverride None
          Order allow,deny
          Allow from all
    
          ## Set 1.2.3.4 to your own IP address and uncomment
          ## the entries below to 'only' allow yourself access to
          ## these stats:
          # Order deny,allow
          # Deny from all
          # Allow from 1.2.3.4/255.255.255.255
       </Directory>
    _EOF
    

    Don’t forget to reload Apache so it takes on your new configuration (and makes that statistics page visible):

    # Reload Apache
    systemctl reload httpd.service
    

    Credit

    This blog took me a long time to put together and test! The repository hosting alone accommodates all my blog entries up to this date. I took the open source available to me and rebuilt it to make it an easier solution and decided to share it. If you like what you see and wish to copy and paste this HOWTO, please reference back to this blog post at the very least. It’s really all I ask.

    Sources