summaryrefslogtreecommitdiff
path: root/tech/set up awstats on ubuntu with nginx.txt
blob: 37a60caade52d8613806c611c9f3593a272a38f5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
If you'd like some basic data about your site's visitors, but don't want to let spyware vendors track them around the web, AWStats makes a good solution. It parses your server log files and tells you who came by and what they did. There's no spying, no third-party code bloat. AWStats just analyzes your visitors' footprints.

Here's how I got AWStats up and running on an Ubuntu 18.04 VPS server running over at [Vultr.com](https://www.vultr.com/?ref=6825229) ([non-affiliate link](https://www.vultr.com/) if you prefer). 

### AWStats with GeoIP

The first step is to install the AWStats package from the Ubuntu repositories:

~~~~console
sudo apt install awstats
~~~~

This will install the various tools and scripts AWStats needs. Because I like to have some geodata in my stats, I also installed the tools necessary to use the AWStats geoip plugin. Here's what worked for me. 

First we need build-essential and libgeoip:

~~~~console
sudo apt install libgeoip-dev build-essential
~~~~

Next you need to fire up the cpan shell:

~~~~console
cpan
~~~~

If this is your first time in cpan you'll need to run two commands to get everything set up. If you've already got cpan set up, you can skip to the next step:

~~~~perl
make install
install Bundle::CPAN
~~~~

Once cpan is set up, install GeoIP:

~~~~perl
install Geo::IP
~~~~

That should take care of the GeoIP stuff. You can double-check that the database files exist by looking in the directory `/usr/share/GeoIP/` and verifying that there's a file named `GeoIP.dat`. 

Now, on to the log file setup.

#### Optional Custom Nginx Log Format

This part isn't strictly necessary. To get AWStats working the next step is to create our config files and build the stats, but first I like to overcomplicate things with a custom log format for Nginx. If you don't customize your Nginx log format then you can skip this section, but make a note of where Nginx is putting your logs, you'll need that in the next step. 

Open up `/etc/nginx/nginx.conf` and add these lines:

~~~~nginx
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                '$status $body_bytes_sent "$http_referer" '
                '"$http_user_agent" "$http_x_forwarded_for"';    
~~~~

Now we need to edit our individual nginx config file to use this log format. If you follow the standard nginx practice, your config file should be in `/etc/nginx/sites-enabled/`. For example this site is served by the file `/etc/nginx/sites-enabled/luxagraf.net.conf`. Wherever that file may be in your setup, open it and add this line somewhere in the `server` block.

~~~~nginx
server {
    # ... all your other config ...
    access_log  /var/log/nginx/yourdomain.com.access.log main;
    # ... all your other config ...
}
~~~~

### Configure AWStats for Nginx

As I said in the beginning, AWStats is ancient, it hails from a very different era of the internet. One legacy from the olden days is that AWStats is very strict about configuration files. You have to have one config file per domain you're tracking and that file has to be named in the following way: `awstats.domain.tld.conf`. Those config files must be placed inside the /etc/awstats/ directory.

If you go take a look at the `/etc/awstats` directory you'll see two files in there: `awstats.conf` and `awstats.conf.local`. The first is a main conf file that serves as a fallback if your own config file doesn't specify a particular setting. The second is an empty file that's meant to be used to share common config settings, which really doesn't make much sense to me.

I took a tip from [this tutorial](https://kamisama.me/2013/03/20/install-configure-and-protect-awstats-for-multiple-nginx-vhost-on-debian/) and dumped the contents of awstats.conf into awstats.local.conf. That way my actual site config file is very short. If you want to do that, then all you have to put in your config file are a few lines.

Using the naming scheme mentioned above, my config file resides at `/etc/awstats/awstats.luxagraf.net.conf` and it looks like this (drop your actual domain in place of "yourdomain.com"):

~~~~ini
# Path to your nginx log file
LogFile="/var/log/nginx/yourdomain.com.access.log"

# Domain of your vhost
SiteDomain="yourdomain.com"

# Directory where to store the awstats data
DirData="/var/lib/awstats/"

# Other domains/subdomain you want included from your logs, for example the www subdomain
HostAliases="www.yourdomain.com"

# If you customized your log format above add this line:

LogFormat = "%host - %host_r %time1 %methodurl %code %bytesd %refererquot %uaquot %otherquot"

# If you did not, uncomment and use this line:
# LogFormat = 1
~~~~

Save that file and open the fallback file `awstats.conf.local`. Now set a few things:

~~~~ini
# if your site doesn't get a lot of traffic you can leave this at 1
# but it can make things slow
DNSLookup = 0

# find the geoip plugin line and uncomment it:
LoadPlugin="geoip GEOIP_STANDARD /usr/share/GeoIP/GeoIP.dat"
~~~~

Then delete the LogFile, SiteDomain, DirData, and HostAliases settings in your `awstats.conf.local` file. We've got those covered in our site-specific config file.

Okay, that's it for configuring things, let's generate some data to look at.

### Building Stats and Rotating Log Files

Now that we have our log files, and we've told AWStats where they are, what format they're in and where to put its analysis, it's time to actually run AWStats and get the raw data analyzed. To do that we use this command:

~~~~console
sudo /usr/lib/cgi-bin/awstats.pl -config=yourdoamin.com -update
~~~~

Alternately, if you have a bunch of config files you'd like to update all at once, you can use this wrapper script conveniently located in a completely different directory:

~~~~console
/usr/share/doc/awstats/examples/awstats_updateall.pl now -awstatsprog=/usr/lib/cgi-bin/awstats.pl
~~~~

You're going to need to run that command regularly to update the AWStats data. One way to do is with a crontab entry, but there are better ways to do this. Instead of cron we can hook into logrotate, which rotates Nginx's log files periodically anyway and conveniently includes a `prerotate` directive that we can use to execute some code. Technically logrotate runs via /etc/cron.daily under the hood, so we haven't really escaped cron, but it's not a crontab we need to keep track of anyway.

~~~~log
Open up the file `/etc/logrotate.d/nginx` and replace it with this: 

    /var/log/nginx/*.log{
        daily
        missingok
        rotate 30
        compress
        delaycompress
        notifempty
        create 0640 www-data adm
        sharedscripts
        prerotate
            /usr/share/doc/awstats/examples/awstats_updateall.pl now -awstatsprog=/usr/lib/cgi-bin/awstats.pl
            if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
                run-parts /etc/logrotate.d/httpd-prerotate; \
            fi \
        endscript
        postrotate
            invoke-rc.d nginx rotate >/dev/null 2>&1
        endscript
    }
~~~~

The main things we've changed here are the frequency, moving from weekly to daily rotation in line 2, keeping 30 days worth of logs in line 4, and then calling AWStats in line 11. 

One thing to bear in mind is that if you re-install Nginx for some reason this file will be overwritten. 

Now do a dry run to make sure you don't have any typos or other problems:

~~~~console
sudo logrotate -f /etc/logrotate.d/nginx
~~~~

### Serving Up AWStats 

Now that all the pieces are in place, we need to put our stats on the web. I used a subdomain, awstats.luxagraf.net. Assuming you're using something similar here's an nginx config file to get you started:

~~~~nginx
server {
    server_name awstats.luxagraf.net;

    root    /var/www/awstats.luxagraf.net;
    error_log /var/log/nginx/awstats.luxagraf.net.error.log;
    access_log off;
    log_not_found off;

    location ^~ /awstats-icon {
        alias /usr/share/awstats/icon/;
    }

    location ~ ^/cgi-bin/.*\\.(cgi|pl|py|rb) {
        auth_basic            "Admin";
        auth_basic_user_file  /etc/awstats/awstats.htpasswd;

        gzip off;
        include         fastcgi_params;
        fastcgi_pass unix:/var/run/php/php7.2-fpm.sock; # change this line if necessary
        fastcgi_index   cgi-bin.php;
        fastcgi_param   SCRIPT_FILENAME    /etc/nginx/cgi-bin.php;
        fastcgi_param   SCRIPT_NAME        /cgi-bin/cgi-bin.php;
        fastcgi_param   X_SCRIPT_FILENAME  /usr/lib$fastcgi_script_name;
        fastcgi_param   X_SCRIPT_NAME      $fastcgi_script_name;
        fastcgi_param   REMOTE_USER        $remote_user;
    }

}
~~~~

This config is pretty basic, it passes requests for icons to the AWStats icon dir and then sends the rest of our requests to php-fpm. The only tricky part is that AWStats needs to call a Perl file, but we're calling a PHP file, namely `/etc/nginx/cgi-bin.php`. How's that work?

Well, in a nutshell, this script takes all our server variables and passes them to stdin, calls the Perl script and then reads the response from stdout, passing it on to Nginx. Pretty clever, so clever in fact that I did not write it. Here's the file I use, taken straight from the Arch Wiki:

~~~~php
<?php
$descriptorspec = array(
   0 => array("pipe", "r"),  // stdin is a pipe that the child will read from
   1 => array("pipe", "w"),  // stdout is a pipe that the child will write to
   2 => array("pipe", "w")   // stderr is a file to write to
);
$newenv = $_SERVER;
$newenv["SCRIPT_FILENAME"] = $_SERVER["X_SCRIPT_FILENAME"];
$newenv["SCRIPT_NAME"] = $_SERVER["X_SCRIPT_NAME"];
if (is_executable($_SERVER["X_SCRIPT_FILENAME"])) {
   $process = proc_open($_SERVER["X_SCRIPT_FILENAME"], $descriptorspec, $pipes, NULL, $newenv);
   if (is_resource($process)) {
       fclose($pipes[0]);
       $head = fgets($pipes[1]);
       while (strcmp($head, "\n")) {
           header($head);
           $head = fgets($pipes[1]);
       }
       fpassthru($pipes[1]);
       fclose($pipes[1]);
       fclose($pipes[2]);
       $return_value = proc_close($process);
   } else {
       header("Status: 500 Internal Server Error");
       echo("Internal Server Error");
   }
} else {
   header("Status: 404 Page Not Found");
   echo("Page Not Found");
}
?> 
~~~~

Save that mess of PHP as `/etc/nginx/cgi-bin.php` and then install php-fpm if you haven't already:

~~~~console
sudo apt install php-fpm
~~~~

Next we need to create the password file referenced in our Nginx config. We can create a .htpasswd file with this little shell command, just replace `yourdomain.com` with the same domain you used in your AWStats config and use an actual username in place of `username`:

~~~~console
printf "username:`openssl passwd -apr1`\n" >> awstats.htpasswd
~~~~

Enter your password when prompted and your password file will be created in the expected format for basic auth files.

Then move that file to the proper directory:

~~~~console
sudo mv awstats.htpasswd /etc/awstats/
~~~~

Now we have an Nginx config, a script to pass AWStats from PHP to Perl and some basic password protection for our stats site. The last, totally optional, step is to serve it all over HTTPS instead of HTTP. Since we have a password protecting it anyway, this is arguably unnecessary. I do it more out of habit than any real desire for security. I mean, I did write an article [criticizing the push to make everything HTTPS](https://arstechnica.com/information-technology/2016/07/https-is-not-a-magic-bullet-for-web-security/). But habit.

I have a separate guide on [how to set up Certbot for Nginx on Ubuntu 18.04](/src/certbot-nginx-ubuntu-1804) that you can follow. Once that's installed you can just invoke Certbot with:

~~~~console
sudo certbot --nginx
~~~~

Select the domain name you're serving your stats at (for me that's awstats.luxagraf.net), then select 2 to automatically redirect all traffic to HTTPS and certbot will append some lines to your Nginx config file.

Now restart Nginx:

~~~~console
sudo systemctl restart nginx
~~~~

Visit your new site in the browser at this URL (changing yourdomain.com to the domains you've been using): [https://awstats.yourdomain.com/cgi-bin/cgi-bin.php?config=yourdomain.com](https://awstats.yourdomain.com/cgi-bin/cgi-bin.php?config=yourdomain.com). If all when well you should see AWStats with a few stats in it. If all did not go well, feel free to drop whatever your error message is in a comment here and I'll see if I can help.

### Motivations

And now the why. The "why the hell don't I just use --insert popular spyware here--" part.

My needs are simple. I don't have ads. I don't have to prove to anyone how much traffic I get. And I don't really care how you got here. I don't care where you go after here. I hardly ever look at my stats. 

When I do look all I want to see is how many people stop by in a given month and if there's any one article that's getting a lot of visitors. I also enjoy seeing which countries visitors are coming from, though I recognize that VPNs make this information suspect.

Since *I* don't track you I certainly don't want third-party spyware tracking you, so that means any hosted service is out. Now there are some self-hosted, open source spyware packages that I've used, Matomo being the best. It is nice, but I don't need or use most of what it offers. And I really dislike running MySQL on the cheap, underpowered VPS servers I use. It uses way too much memory. Unfortunately Matomo requires MySQL, as does Open Web Analytics. 

By process of elimination (no MySQL), and my very paltry requirements, the logical choice is a simple log analyzer. I went with AWStats because I'd used it in the past. Way in the past. But you know what, AWStats ain't broke. It doesn't spy. It uses no server resources. And it tells you 95 percent of what any spyware tool will tell you (provided you actually [read the documentation](http://www.awstats.org/docs/)).

In the end, AWStats is good enough without being too much. But for something as simple as it is, AWStats is surprisingly complex to get up and running, which is what inspired this guide.

##### Shoulders stood upon:

* [AWStats Documentation](http://www.awstats.org/docs/awstats_config.html)
* [Ubuntu Community Wiki: AWStats](https://help.ubuntu.com/community/AWStats)
* [Arch Wiki: AWStats](https://wiki.archlinux.org/index.php/Awstats)
* [Install, configure and protect Awstats for multiple nginx vhost on Debian](https://kamisama.me/2013/03/20/install-configure-and-protect-awstats-for-multiple-nginx-vhost-on-debian/)