In this series, we introduce a modern observability stack. We will cover how you can enhance your monitoring capabilities on your server using Nginx, Docker, Clickhouse, Vector, and Grafana. In this particular post, we explore tracking & visualizing user activity (mostly bots) on our VPS with the help of Maxmind’s GeoIP module.

So, You Deployed Your Application…

Self-hosting is stressful. When you buy a server for the first time and deploy an application to it, chances are you feel a strong sense of vulnerability. Your server is out there, accepting requests from all over the world. Without some form of monitoring, you’re in the dark about how your app is performing.

  • Are there any malicious requests?
  • How many visitors do you have, and how does this number change?
  • Which IP addresses are reaching your server?

In this series, we introduce a powerful yet simple stack to answer these questions and more, without turning ourselves into absolute control freaks.

Let’s Track Our Visitors

If you’ve ever tried to self-host a server, you know that bots often visit it more than humans do. You’ll see logs of requests coming from various IPs. And if you search those IPs using a GeoIP service, you might find that these requests come from all over the world. Today, our focus is to develop a solution to monitor these requests on a map and even deny or allow those requests by region.

Bots, bots everywhere

Nginx: Our gateway to the internet

For the sake of simplicity, we will consider a scenario where we want to serve a simple index.html file to our users. For that we will need a webserver. We will use Nginx, We talked about in a little more detail in the previous post. In this one we will go right into the implementation. We will use docker for making our pipeline portable and platform independent. We talked about in detail in the previous post as well.

Nginx is the front face of your server. Every request should be reverse proxied over Nginx. This will allow us to reduce the number of points we need to observe. Nginx will drop a one-line log for each request in a format that we will configure.

docker-compose.yml

Lets start with a simple compose file. This will run an nginx server and forward the requests to configured locations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
version: "3.8"
services:
  nginx:
    container_name: nginx
    image: nginx:latest
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - logs:/var/log/nginx
      - certs:/etc/letsencrypt
  • We bind ports 80 and 443 for http and https respectively.
  • We mount an nginx.conf file, this will serve as the main configuration file for our nginx service.
  • We created a volume to store log files. The values on the right of the : are the default settings.

nginx.conf

Mounting the entire nginx.conf file is not something I would usually recommend, using a separate conf file under /etc/nginx/conf.d is more granular approach but for the sake of simplicity let’s continue with only one file.

http {
    include mime.types;

    server {
        server_name mcagridurgut.com;  # Server name or domain
        access_log /var/log/nginx/website.log;

        location / {
            root /usr/share/nginx/website;
            index index.html;
        }

        location = @geoip {
            internal;
            proxy_pass http://geoip:8080/;
            # proxy_pass_request_body off;
            proxy_set_header X-Geoip-Address $remote_addr;
        }

    }
}

Let’s enrich the logs

When you do docker exec -it nginx bash into the container and check the logs in /etc/var/log/nginx/website.log you will probably see smth like:

[IP REDACTED] - - [03/Apr/2024:18:40:36 +0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:124.0) Gecko/20100101 Firefox/124.0"
[IP REDACTED] - - [03/Apr/2024:18:40:36 +0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:124.0) Gecko/20100101 Firefox/124.0"
[IP REDACTED] - - [03/Apr/2024:18:40:37 +0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:124.0) Gecko/20100101 Firefox/124.0"
[IP REDACTED] - - [03/Apr/2024:18:40:37 +0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:124.0) Gecko/20100101 Firefox/124.0"
[IP REDACTED] - - [03/Apr/2024:18:40:56 +0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:124.0) Gecko/20100101 Firefox/124.0"
...

This is the default log format of the nginx. Although it has some useful information like IP, user agent, request URL, we need to enrich this data with geolocation.

What is GeoIP

IP address geolocation is the process of uncovering an online user’s geographic location based on the IP address of their computer or mobile device. IP address geolocation is a useful tool for businesses that rely on the insights provided by knowing where their users are located in the world.

Maxmind’s Geolocation Databases

In order to match IP addresses with geolocation information we need an IP geolocation database. Maxmind is a platform where you can find an accurate and up to date database.

ObservabilityStack’s GeoIP API

Feeding nginx with a geolocation database is a bit complicated thing to do, escpecially when you use dockerized nginx image. You need to add geoip module to nginx and recompile it. Instead we will use the neat solution of ObservabilityStack. This repo creates a REST API for geolocation queries. We will use the docker image as explained in readme file.

Updated docker-compose.yml

 1version: "3.8"
 2services:
 3  nginx:
 4    container_name: nginx
 5    image: nginx:latest
 6    ports:
 7      - "80:80"
 8      - "443:443"
 9    volumes:
10      - ./nginx.conf:/etc/nginx/nginx.conf
11      - logs:/var/log/nginx
12      - certs:/etc/letsencrypt
13    depends_on:
14      - geoip
15      
16  geoip:
17    image: ghcr.io/observabilitystack/geoip-api

Updated nginx.conf

 1http {
 2  include mime.types;
 3	
 4  set_real_ip_from 0.0.0.0/0;
 5  real_ip_header X-Forwarded-For;
 6
 7  log_format geoip_enriched '$remote_addr | $remote_user | [$time_iso8601] | "$request" | '
 8                  '$status | $body_bytes_sent | "$http_referer" | '
 9                  '"$http_user_agent" | "$http_x_forwarded_for" | '
10                  '"$geoip_country" | "$geoip_stateprov" | "$geoip_city" | '
11                  '"$geoip_latitude" | "$geoip_longitude" | "$geoip_continent" | '
12                  '"$geoip_timezone" | "$geoip_asn" | "$geoip_asnorganization"';
13
14  server {
15    server_name mcagridurgut.com;  # Server name or domain
16
17    auth_request @geoip;
18
19    # Transfer header values returned from the auth_request into
20    # Nginx variables
21    auth_request_set $geoip_country $upstream_http_x_geoip_country;
22    auth_request_set $geoip_stateprov $upstream_http_x_geoip_stateprov;
23    auth_request_set $geoip_city $upstream_http_x_geoip_city;
24    auth_request_set $geoip_latitude $upstream_http_x_geoip_latitude;
25    auth_request_set $geoip_longitude $upstream_http_x_geoip_longitude;
26    auth_request_set $geoip_continent $upstream_http_x_geoip_continent;
27    auth_request_set $geoip_timezone $upstream_http_x_geoip_timezone;
28    auth_request_set $geoip_asn $upstream_http_x_geoip_asn;
29    auth_request_set $geoip_asnorganization $upstream_http_x_geoip_asnorganization;
30
31    # Use the variables we defined above to send header values back
32    # to the client. To send those values further to a downstream
33    # reverse proxy target, use proxy_set_header directive
34    add_header X-Geoip-Country $geoip_country always;
35    add_header X-Geoip-StateProv $geoip_stateprov always;
36    add_header X-Geoip-City $geoip_city always;
37    add_header X-Geoip-Latitude $geoip_latitude always;
38    add_header X-Geoip-Longitude $geoip_longitude always;
39    add_header X-Geoip-Continent $geoip_continent always;
40    add_header X-Geoip-Timezone $geoip_timezone always;
41    add_header X-Geoip-Asn $geoip_asn always;
42    add_header X-Geoip-AsnOrganization $geoip_asnorganization always;
43
44    access_log /var/log/nginx/website.log geoip_enriched;
45
46    location / {
47      proxy_set_header Host      $host;
48      proxy_set_header X-Real-IP $remote_addr;
49      proxy_pass_request_body off;
50      proxy_set_header X-Geoip-Address $remote_addr;
51
52      proxy_set_header X-Geoip-Lat $geoip_latitude;
53      proxy_set_header X-Geoip-Long $geoip_longitude;
54      root /usr/share/nginx/website;
55      index index.html;
56    }
57
58    location = @geoip {
59      internal;
60
61      proxy_pass http://geoip:8080/;
62      # proxy_pass_request_body off;
63      proxy_set_header X-Geoip-Address $remote_addr;
64    }
65  }
66}

If you are using a different compose than the above or use no compose at all, make sure that geoip and nginx services are in the same network.

Once we setup these services we will have the geolocation information of our visitors in the nginx logs.

GeoIP enriched logs

54.91.80.197 | - | [2024-04-07T10:56:36+00:00] | "GET /exec HTTP/1.1" | 404 | 555 | "-" | "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36" | "-" | "US" | "Virginia" | "Ashburn" | "39.0469" | "-77.4903" | "NA" | "America/New_York" | "14618" | "AMAZON-AES"
54.91.80.197 | - | [2024-04-07T10:56:37+00:00] | "GET /99vt HTTP/1.1" | 404 | 555 | "-" | "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36" | "-" | "US" | "Virginia" | "Ashburn" | "39.0469" | "-77.4903" | "NA" | "America/New_York" | "14618" | "AMAZON-AES"
54.91.80.197 | - | [2024-04-07T10:56:38+00:00] | "GET / HTTP/1.1" | 200 | 22819 | "-" | "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36" | "-" | "US" | "Virginia" | "Ashburn" | "39.0469" | "-77.4903" | "NA" | "America/New_York" | "14618" | "AMAZON-AES"
54.91.80.197 | - | [2024-04-07T10:56:38+00:00] | "GET /Res/login.html HTTP/1.1" | 404 | 555 | "-" | "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36" | "-" | "US" | "Virginia" | "Ashburn" | "39.0469" | "-77.4903" | "NA" | "America/New_York" | "14618" | "AMAZON-AES"

What’s in for the next part?

So we did enrich our nginx logs with geoIP information. In the next part we will be visualizing this information with grafana.

We will use Vector.dev to pass the logs to Clickhouse database. Then we will configure our Grafana to visualize this information on a geomap.

So, stay tuned!