One of the things that we at TinkerDifferent believe in most of all, is transparency. We want you to know how we run. This post will focus on the nuts and bolts of the forum infrastructure.
We want you to have faith in our ability to run the forum, not just from a user facing perspective, but in terms of behind-the-scenes as well.
We're going to show you how we've set everything up and then explain why we've done it in this way.
Infrastructure
First of all, TinkerDifferent is not a monolithic entity, on the back end. Our servers are located at Linode - a VPS (Virtual Private Server) provider.
The first node is the web front end running XenForo. It's configured as a dedicated 2 CPU node, with 4GB RAM and 80GB of disk running Ubuntu 20.04 LTS. Do those specs seem low? The forum is currently in its infancy, current projections show this will support several hundred users at a time. Linode allows us to scale the server if needed - we put the site in maintenance mode, power the node down and drag the sliders to add more CPU, memory, or disk space. It really is that simple. Start small, and scale when needed. This helps to keep our running costs down, too.
The second node is the SQL server. It's configured as a shared, single CPU node, with 1GB RAM and 30GB of disk also running Ubuntu 20.04 LTS. Again - sparse at the moment as the SQL server isn't exactly under load. Because we can scale both nodes when needed, it's really not a problem if we start running into resource shortages. It's a 10 minute process to upgrade.
The two nodes are connected together via a private network. The rationale for having two separate nodes? Well, the Web Server can concentrate on being a solid Web only front-end, while the SQL server does just that, minute after minute, hour after hour, day after day. Each node does their own separate thing. This has a number of advantages - speed and scalability being one, but also in terms of backups.
In front of all of this is CloudFlare which provides caching of images and pages to reduce the load on the web server and provide some DDOS protections.
Backups
Linode provides a service to snapshot the server every 24 hours. These can be used to restore to a new server.
But it's not just snapshots! We also have both nodes setup to dump their contents to a BackBlaze B2 on a different schedule. This gives us multiple avenues to recover data, should the worst happen. On top of this as well - we periodically test the backups.
Lastly, we are working on a fully offline archive of the site published at a regular interval so if there ever is an issue the site will be archived for years to come.
Monitoring
We use a few different tools to monitor the infrastructure:
Uptime Monitoring - We make a http call to the site every 30 seconds with an externally hosted uptime-kuma instance. It pings our admins on Discord if the site is ever unreachable.
Metrics - Datadog gives us granular metrics of how the infrastructure is doing - how much free space, cpu, sql queries, etc. We have monitors setup to alert our Discord if any of those go sideways. Datadog also has an “anomaly” detection that alerted us when we were on the front page of Hacker News. Site was fine and none of the traditional alerts fired, but this one did. We continually evaluate if new monitors should be added or adjusted.
Logging - Sure we can login to each node and grep through logs, but It’s great to have them centralized and easily searchable. LogDNA provides this service for us.
Conclusion
We continually monitor, test, and review our decisions and assumptions to make sure the site is running smoothly now and for years to come.
We want you to have faith in our ability to run the forum, not just from a user facing perspective, but in terms of behind-the-scenes as well.
We're going to show you how we've set everything up and then explain why we've done it in this way.
Infrastructure
First of all, TinkerDifferent is not a monolithic entity, on the back end. Our servers are located at Linode - a VPS (Virtual Private Server) provider.
The first node is the web front end running XenForo. It's configured as a dedicated 2 CPU node, with 4GB RAM and 80GB of disk running Ubuntu 20.04 LTS. Do those specs seem low? The forum is currently in its infancy, current projections show this will support several hundred users at a time. Linode allows us to scale the server if needed - we put the site in maintenance mode, power the node down and drag the sliders to add more CPU, memory, or disk space. It really is that simple. Start small, and scale when needed. This helps to keep our running costs down, too.
The second node is the SQL server. It's configured as a shared, single CPU node, with 1GB RAM and 30GB of disk also running Ubuntu 20.04 LTS. Again - sparse at the moment as the SQL server isn't exactly under load. Because we can scale both nodes when needed, it's really not a problem if we start running into resource shortages. It's a 10 minute process to upgrade.
The two nodes are connected together via a private network. The rationale for having two separate nodes? Well, the Web Server can concentrate on being a solid Web only front-end, while the SQL server does just that, minute after minute, hour after hour, day after day. Each node does their own separate thing. This has a number of advantages - speed and scalability being one, but also in terms of backups.
In front of all of this is CloudFlare which provides caching of images and pages to reduce the load on the web server and provide some DDOS protections.
Backups
Linode provides a service to snapshot the server every 24 hours. These can be used to restore to a new server.
But it's not just snapshots! We also have both nodes setup to dump their contents to a BackBlaze B2 on a different schedule. This gives us multiple avenues to recover data, should the worst happen. On top of this as well - we periodically test the backups.
Lastly, we are working on a fully offline archive of the site published at a regular interval so if there ever is an issue the site will be archived for years to come.
Monitoring
We use a few different tools to monitor the infrastructure:
Uptime Monitoring - We make a http call to the site every 30 seconds with an externally hosted uptime-kuma instance. It pings our admins on Discord if the site is ever unreachable.
Metrics - Datadog gives us granular metrics of how the infrastructure is doing - how much free space, cpu, sql queries, etc. We have monitors setup to alert our Discord if any of those go sideways. Datadog also has an “anomaly” detection that alerted us when we were on the front page of Hacker News. Site was fine and none of the traditional alerts fired, but this one did. We continually evaluate if new monitors should be added or adjusted.
Logging - Sure we can login to each node and grep through logs, but It’s great to have them centralized and easily searchable. LogDNA provides this service for us.
Conclusion
We continually monitor, test, and review our decisions and assumptions to make sure the site is running smoothly now and for years to come.