When building Caucus, my first personal project, I had a couple of core goals and philosophies that I used to guide all my decisions.
- Keep it simple. Keep it small. Be very lazy
- Focus ruthlessly on your user experience and value add. For all other distractions: Outsource and minimize
- Enable rapid scalability to millions of visitors if needed
- Minimize costs and automate everything. The site should run itself with zero manual maintenance, on a hobbyist budget
By focusing on the above, I was able to put together this entire tech stack in ~3 months, while still holding down a full-time job, juggling my other hobbies/social life, and keeping site costs down to ~$50/month and zero manual maintenance. All this despite me being a complete beginner to web development at the time.
As you read through the rest of the technology stack below, keep in mind that these decisions were all taken on the basis of the above principles. If you’re a hobbyist working on a pet project, or someone bootstrapping a startup while still holding down a full-time job and a social life, I highly encourage following the above principles as well.
Table of Contents
Disclaimer: This post was written and published in 2016. Some of the information mentioned below may now be out-of-date, but most of it is still applicable.
– Back-End API Deployment: Heroku
– Data Storage: AWS RDS PostgreSQL
– Data Caching
– User Authentication: StormPath
– Payment processing: Stripe
– Email: Mandrill
– Performance Monitoring: New Relic
– System Alerts: AWS SNS and OpsGenie
– Back-End Platform: Java 1.8
– SQL Tooling: JOOQ
– API Documentation: Swagger
– Test Coverage: Jacoco
– File Storage: AWS S3
– Logging: FlyData
– DNS Provider: GoDaddy
– Server Implementation: Jersey and Jetty
– Front End Deployment: AWS Cloudfront CDN
– Front End Development: Freelancer
– Graphical Design: Upwork
– Wish List
Back-End REST Deployment: Heroku
One of the most important decisions I took early on was that I wanted my project to be completely server-less. I have zero sysadmin experience, and know next to nothing about maintaining a single linux system, let alone a server cluster. I have seen too many teams struggle mightily with server health issues and server maintenance, and I wanted no part of that.
With that in mind, the easiest and most straightforward path that I found, was Heroku. Deployment literally consists of doing a git push. Adding more servers (dynos) to your application literally consists of sliding a bar on the GUI dashboard, with new servers getting spun up and going live within minutes. There is absolutely no server maintenance or OS management required on your part whatsoever. Focus purely on your app, make it generic instead of having to customize it for any particular service, and let Heroku deal with the system issues.
Pricing is reasonable, at least for hobbyists. You can spin up a completely free dyno (server) that runs for 18 hours/day, and has good enough performance to support a small number of concurrent users. If you want a dyno that runs 24 hours/day (ie, when you’re ready for release), you can upgrade to a hobby dyno that costs a reasonable $7/month. Upgrading to the next level is significantly more expensive unfortunately; it costs $25/month/dyno just to get the same dyno that you earlier received for $7. I don’t know how much more expensive that is compared to the alternatives… but when I started off on this project, I figured the chances of the site having enough visitors to warrant scaling was pretty low. Given my above bias towards minimal designs and minimal effort, Heroku seemed like the best choice.
One of the other significant benefits of using Heroku, is the wide variety of addons and ecosystem that exists around the Heroku platform. Many of these addons are free as well, and greatly minimize the amount of manual work you’ll need to do. You can find examples of this mentioned below in this article.
One downside that I did notice very early on, is their lack of support for scheduled tasks, of the kind you would accomplish using cron on a server. There is a free Scheduler Addon that allows for you to schedule tasks/executables that are to be run on a periodic basis. However, their documentation strongly warns that these schedulers are not 100% reliable, and certain scheduled tasks can be missed. They recommend using an alternate tool if you want guaranteed scheduling, but it’s much more complicated to set up. And if you read the fine print, you’ll learn that because of dyno daily restarts, even this alternative could potentially miss out on some scheduled tasks.
Ultimately, I decided to just go with the simple Scheduler Addon, and configured it to email me every time it runs. That way, if it happened to miss a run, I would be able to tell by checking my email. Not a great solution (somewhat violates my policy against manual-maintenance), but it seems like the least of all evils.
If I were to start over again with a blank slate, I might investigate even more abstract services such as AWS Lambda and AWS API Gateway. I know little about the nitty gritty details of how they work, but they do seem promising, and AWS has a track record of producing high-scalability services at reasonable prices. For now though, I’ve been pleased with Heroku.
Data Storage: AWS RDS PostgreSQL
One of the other major pieces of your application, is figuring out how you want to store your data. Given how central data is to your application, and how frequently it is read/written, this is a major decision that you should think carefully about.
The first decision I made early on was to go with a AWS-managed database solution. As discussed above, I had no interest in running my own servers, and I had even less interest in installing and managing my own Database. I also wanted something that is robust, reliable and capable of rapid scaling when needed, and Amazon’s AWS databases seemed perfect for all of these goals. You just click a few buttons on their web console, and they spin up a database for you, set it up with reasonable default configs, and handle all the ongoing maintenance themselves. With their multi-AZ deployment, you can ensure that your database remains functional even if one instance goes down, and scaling up to a larger box, or creating additional read-only replicas to grow your read-throughput, are all just a few clicks away as well.
The next decision is figuring out exactly what kind of data-store you want. NoSQL might be the hot trend at the moment, but I’ve always favored Relational Databases. Schema enforcements, and triggers/constraints/checks, all automate the process of ensuring that your data store never gets corrupted, and remains consistent. The use of relational JOINs might not be performance efficient at “web-scale”, but it keeps your data DRY, and allows for you to query data aggregations in all sorts of interesting use-cases, including unforeseen ones, without having to write any manual code. Going with a Relational SQL database was for me an easy choice to make.
The last, and hardest decision, was choosing which particular SQL database to go with. I decided to go with an open-source database, just to maximize my flexibility in future, so that crossed off Oracle and MSSQL. I would have loved to use Aurora. It’s MySQL compliant and offers the best scalability out of all AWS RDS options. Given that scaling out a database is much harder than scaling out your servers, this was a huge draw. Unfortunately, Aurora’s pricing is simply not friendly to hobbyists. Their cheapest offering comes in at >$150 per month. Compare this to AWS PostreSQL offerings, which start off at ~$10-20 per month.
Of the remaining options, MySQL and PostgreSQL seemed to have the widest community adoption. One the one hand, MySQL follows the exact same standard as Aurora; so if I wanted to switch to Aurora in the future, MySQL would make the switch much easier. On the other hand, PostgreSQL seems to be more favored than MySQL by a number of people who’ve analyzed their differences. Ultimately, I decided to go with PostgreSQL, and hope that if the day ever comes when the site outgrows RDS-PostgreSQL’s max scalability options, we’ll be able to find a more sophisticated solution then.
It’s worth keeping in mind that if the performance you’re getting from your database above is still not sufficient, this problem could be solved by adding a layer of caching. This will also minimize the load on your database, which is vital for the reasons discussed above. There are various tools like ElastiCache that are oriented towards cloud-based in-memory data caching. This allows you to cache the most heavily used portions of your data in a performance-optimized store, in order to better serve user requests and minimize database loads. Because it’s only a temporary cache, you can even use performance-shortcuts that may not be 100% accurate or consistent, secure in the knowledge that your “real data” is safe in the database.
That said, using a tool like ElastiCache seemed overkill for the early stages of an app, before such tools are known to be needed. At low loads, the AWS RDS performance is good enough to not be a bottleneck for your overall performance. In fact, having to make any network requests at all is one of the most significant bottlenecks, and in that sense, making a network request to Elasticache is just as bad as making a network request to RDS.
Keep in mind that there’s an even more high-performance layer of caching that you could use, that’s simpler as well. Your application’s heap! Certain queries (eg, community information for a specific community) are so frequently used by so many users, that caching them in your application’s heap can yield massive improvements for minimal memory footprint.
The Guava Cache library is a great way to accomplish the above. It basically behaves like a concurrent map, except that you can configure it to automatically evict old entries, put size limits on the cache to constrain its memory footprint, and define generator methods that the cache will automatically call when the cache lookup fails (ie, pull data from the database if it isn’t already cached). This way, you can route all data accesses through your local cache, and the cache will handle all the details of pulling data from the database or Elasticache where needed.
Of course, the best performance optimization of all, is eliminating REST requests completely. This may sound like hand-waving, but is easily doable by tweaking the http response that you generate for each request. By setting the cache-control headers when generating responses, you can dramatically reduce the number of requests that hit your server. Setting the cache control header tells your user’s browser (and other points along the way) to cache the results for a specified period of time. That way, if the user makes the exact same request again, the request will be fulfilled immediately by the browser.
Keep in mind that this should only be used by requests where the same response can be reused multiple times, and not for requests where getting fresh data is important. Note also that if the same route returns unique user-specific data to each user, you should make use of the Varying header to ensure that user A doesn’t get a cached response meant for user B.
User Authentication: StormPath
Every month, we hear about some other major website getting hacked and users’ passwords getting leaked. Because most people reuse the same password in many different websites, this can compromise their security everywhere around the web, and not just on one site. Safeguarding your user’s passwords is thus a big responsibility, and given that even major tech companies like [LinkedIn have fallen prey](https://en.wikipedia.org/wiki/2012_LinkedIn_hack) to amateurish mistakes in this field, I didn’t think it would be fair to my users for me to take on this problem myself.
Fortunately, I didn’t have to! Stormpath, a company whose entire business model revolves around safe and robust user authentication APIs, specializes in this field and caters very effectively to startups and hobbyists like myself. As a startup, they still have a few rough edges in their APIs, but they certainly excel at their core offerings and value add, at very reasonable prices. We route all login authentication through Stormpath, thus ensuring that even if our servers were to be compromised, our users’ passwords would still be completely safe.
One point to note: login verification and session handling are actually 2 very different problems. Login verification is the process of checking a user’s email/password in order to log them into their account. Session handling is the process of tracking and maintaining a user’s session after they have already logged in, up until the point where they have logged out. Login verification requires dealing with user passwords, but session handling does not. Hence, when you’re building a social media website, as long as you protect your users’ passwords, the liability involved in session handling is much more limited.
Initially, we tried utilizing Stormpath’s API offerings for session management as well. However, our first pass ran into some roadblocks. We also realized that almost every single request on the server requires session handling, and having to make a network request to Stormpath every time would introduce significant latencies that would worsen our user experience. Hence, over the next 2 days, we researched this issue to death and built our own session management system. I could try to walk you through it here, but it would simply be a watered down version of this excellent conference talk. There are also some excellent resources online that describe the common errors to avoid when building your own session management system.
Looking back, there is indeed some complexity involved here, and if I had to start all over again as a fresh beginner, I would just use Stormpath’s API offering to avoid having to get my hands dirty. But if you really want to minimize user latency and enable wide scalability of your servers, you should consider looking into the above and taking the time to build your own stateless session handler.
2018 update: Stormpath has now been shut down. I recommend using FirebaseAuth instead.
DNS Provider: GoDaddy
Yes, I know what you’re thinking. GoDaddy doesn’t have the best reputation in the industry, and they certainly do engage in some gimmicky marketing. However, I already had a GoDaddy account, and in the interests of moving quickly, I used my existing account to buy the domain name for thecaucus.net. From there, it just stuck around through force of inertia. To their credit, their prices are reasonable enough, and their services good enough, that taking the time and effort to migrate to a different provider never seemed like it was worth the effort.
Note the scope of what I actually use GoDaddy for: I don’t use them to actually host or vend any data. That part is done entirely by Cloudfront (covered further below). The only thing I use GoDaddy for is to tie together the domain name (thecaucus.net) and the cloudfront URL (d35yazrqbfyltz.cloudfront.net).
The above sounds simple enough, but astoundingly, I had to spend an entire day tearing my hair out over this, before finally getting something I want up and running. Here are the various DNS options that you have, and the problems with each of them:
– Forwarding thecaucus.net -> d3zrqyltz.cloudfront.net: users are going to see an ugly d3zrqbltz.cloudfront.net URL, not a pretty thecaucus.net URL
– Forwarding with masking: Users see thecaucus.net, which is great, but all suffixes are stripped out.
If the user refreshes a page, they are taken all the way back to the home page, not the page they were on.
– CNAME http://www.thecaucus.net -> d41gnjkg.cloudfront.net, and then register http://www.thecaucus.net CNAME on the cloudfront dashboard as well: www.thecaucus.net works! Suffixes and page refreshes are properly handled.
However, your user has to always type http://www.thecaucus.net. If he simply types thecaucus.net, nothing comes up.
– Setting up CNAME for thecaucus.net -> d3rqfyl.cloudfront.net: APEX CNAMEs are not allowed by most services.
AWS Route 53 claims to offer APEX CNAMEs… but it only works if the target is their own service, not an external domain.
Cloudflare claims to offer APEX CNAMEs… but they only offer DNS services if you also purchase other expensive plans as well.
– Which finally brought me to my final solution which solves all the above problems:
Forwarding thecaucus.net -> http://www.thecaucus.net, as well as,
CNAME http://www.thecaucus.net -> d4wgnjg.cloudfront.net
Any user who visits thecaucus.net, will now get redirected to http://www.thecaucus.net, which is in turn CNAMED to d16wgakg.cloudfront.net. All URL suffixes and query parameters are handled properly as well.
My only grumble at this point is that any user who browses to thecaucus.net will not actually see the green-lock-sign on their browser, even though the CNAMED cloudfront domain itself has https support, and all user interactions and content transfers happening between the client’s front-end-app and the Heroku back-end-server, happen over SSL on HTTPS, which makes the communication fully encrypted and secure. I investigated getting a free SSL certificate from [LetsEncrypt](https://letsencrypt.org/), but the resulting SSL certificate doesn’t seem to be usable on GoDaddy.
Admittedly I’m still a novice and don’t understand many of the things that are happening in this section, so maybe I’m doing something wrong (let me know if that’s the case!). But my current plan is to wait to see if this site actually becomes popular, and if so, I’ll then fork out the money for a GoDaddy SSL certificate. If I could do it all over again, I might consider using a different service like [AWS Route53](https://aws.amazon.com/route53/). But for now, GoDaddy does seems like a good enough solution for my needs.
File Storage: AWS S3
As discussed above, we rely on relational databases to store most of our data, so that doesn’t leave much need for file storage. We simply needed a place to dump log files, and to keep the front-end-application that is vended by cloudfront. AWS S3 synergizes well in this regard, and is very well supported/reputed by the general community, so I chose this without too much debate or thought.
Your app on Heroku is likely producing copious amounts of log statements, but Heroku only persists a small sliver of the data. So how do you save the rest for later use? Thankfully, the large Heroku ecosystem of addons comes in very handy here. The FlyData addon will compress and transfer up to 5GB of log data every month, for free, to an AWS S3 bucket of your choice. For development purposes, this is more than sufficient, and it’s hard to beat free. In addition, their web console interface is beautiful. You can select any date/time range, and it will automatically fetch all log files from the matching range, and download them to your desktop. You can then simply unzip it and start grepping away, without having to manually browse through the various files on S3 and downloading them individually.
I haven’t done a detailed price comparison with competing services for larger bandwidths, but at least for hobbyists, Heroku FlyData seems perfect.
2018 Update: I now recommend Papertrail instead.
Test Coverage: Jacoco
Completely free. Relatively quick and simple set up. Automatically generates code/branch coverage reports whenever I run my test suite. Formats all the results in a user-friendly and intuitive HTML file that can be pulled up and navigated from your browser. What’s not to love about it? Any serious team that cares about quality should definitely set up a tool like this and monitor it regularly.
That said, if I had more time to work on this, one thing I would do is set up a mutation testing framework. Having done plenty of test development in my past, one lesson I’ve learnt very clearly is the following: it’s very easy to write tests that cover 100% of your code. *Whether these tests will actually catch bugs is a completely different matter*. It’s very easy to introduce bugs that produce an incorrect result in some portion of the output, but still leave most of the output and the code-execution-path intact. Tests that simply “exercise all of your code” but don’t check 100% of the resulting output behavior, will overlook such bugs.
A mutation test solves this problem head on: It deliberately introduces subtle bugs into your code, and then checks to see if your test suite is capable of catching the bugs it just introduced. I haven’t had the time to use such a framework myself, but in theory, it represents the best possible indicator of your test suite’s effectiveness (I might be biased because I had suggested exactly this same idea to my company back in 2010, half a decade before I ever heard the term “mutation testing”, but I digress).
That said, despite the flaws in code coverage tools, it’s still miles better than not using any coverage tools at all. Code coverage tools can be easily gamed, but if used with sincerity, can help you identify important holes in your test suite. As someone who loves making refactoring changes constantly, I rely heavily on my test suite to enable safe refactorings, and in turn, I rely heavily on code coverage tools to ensure that my test suite is good enough to get the job done, and Jacoco has massively helpful in that regard.
Backend platform: Java 1.8
I know Java gets a lot of hate, probably more so than any other major language. So let me go against the grain and state this: *I love Java!* I spent a decade programming in C and C++, and from the moment I discovered Java just 2 years ago, I knew it had already become my favorite language.
It offers performance comparable to languages like C++, but without any pointers or memory management, two of the most challenging tasks when programming in C++. Without having to worry about such challenges, I was able to code in Java faster than I was ever able to code in C++. It’s true that you lose some run-time performance because of these features, but in return, you get a code base that is so much simpler and guaranteed to be free of memory leaks and pointer-bugs.
People complain about Java being “verbose” compared to dynamic languages like Python, but what others see as verbosity, I see as type safety and readability. When reading a function in any dynamic language, it takes quite a bit of effort to figure out what exactly the various input and variable types are. With Java, this is a non-problem. The type is stated explicitly, without any guess work involved.
Such explicitness also offers tremendous compile-time-checks for your code. If you’ve made a type mistake somewhere, and are trying to do something with an object that it cannot do, Java will flag the error immediately at compile time. No test running (or even writing) necessary. The number of times this feature saved my behind, is mind boggling.
People see OOP as being out of fashion, but personally, I can’t imagine building any major software system without using the OOP paradigm. Being able to break down extremely complex problems, into many smaller and simpler problems, developing objects that can solve each of those simple problems in isolation, and integrating all of them together at the top level with all the low level details being abstracted away, is so fundamental to system design, that I would be lost without it.
People mock Java as being “boring”, but to me, this “boredom” implies endless community tooling and support, for any and all needs. The TIOBE programming Index has Java firmly in 1st place, and this certainly shows in the size of the language ecosystem. As a new developer building my first web-app, I constantly faced a barrage of questions and doubts on any and all matters. And when I did, StackOverflow and other such resources always had an answer for me, given the huge number of Java Q&A in their store.
Whatever utility I wanted, I rarely had to reinvent myself, and could instead find a solid open source Java library that fulfilled my needs. Whatever 3rd party service I wanted to use, they always had a Java SDK available that made integration a breeze. And the IDEs! The amount of features that IDEs like Intellij could support on Java, thanks to static types and fast compile times, is astounding. If I ever had any doubt about what methods were available on a custom class, or how to find the source code for the corresponding method, or where certain variables/methods were being used, Intellij was always there with a quick answer. The tremendous amount of community support and tooling around Java easily cut down the problem space by a factor of 2.
To quote Atwood quoting McConnel quoting Dijkstra, “Nobody is really smart enough to program computers. Fully understanding an average program requires an almost limitless capacity to absorb details and an equal capacity to comprehend them all at the same time… Most of programming is an attempt to compensate for the strictly limited size of our skulls. The people who are best at programming are the people who realize how small their brains are… The more you learn to compensate for your small brain, the better a programmer you’ll be.”
The Java approach of auto-garbage-collected, verbose, statically typed, IDE-driven OOP style with oodles of community tooling, might not be fashionable. But personally, it’s been of immense help in compensating for the very small size of my brain.
Server Implementation: Jersey and Jetty
To be honest with you, choosing a server framework was something I gave very little thought to. In order to get going quickly, I started off with a Jersey-Jetty combination, and saw no reason to change ever since.
As a novice in this field, I found the Jersey framework to be very intuitive. By using the framework annotations (@Path, @Param, @Consumes etc), I was able to program REST routes that looked just like generic Java methods, without ever having to deal with intermediaries like WebAppContexts or ServletRequests.
I never did any deep dive into Jetty, but it seemed relatively simple to set up and configure, after which it fulfilled all my needs without me needing to get my hands dirty with it. Given how popular both Jersey and Jetty are, there is vast amounts of documentation around both as well, which gave me the confidence that I wasn’t making a catastrophic mistake by picking either one. All in all, both Jersey and Jetty just worked silently, without much fanfare, which is I suppose is the best recommendation you can give in support of any tool.
API Documentation: Swagger
When it came time for me to hire a front-end-developer to build the site’s front-end, I knew I had to start documenting all of the REST routes on my back-end, their input parameters, and what they produced. Given the large number of routes that I had, and how often they were changing, the idea of manually documenting everything sounded like torture. I decided to try some googling to see what tools existed to facilitate solving this problem, and repeatedly ran into one name: Swagger.
It took me a little while to set it up, and deal with frustrating CORS related errors. But once I did, the result was magical. The tool automatically parses your entire API, as well as their associated paths, request types, and input parameters, and generates an output JSON that has all of this information neatly encoded. It then exposes this JSON object on a REST route of its own, allowing anyone to query your back-end and extract the most up-to-date documentation of your REST API. There’s even a Swagger web-GUI available where you can enter your API URL, and it will show you a very nicely formatted, interactive, documentation of your entire API.
Overall, Swagger was a little bit of a pain to set up, but once I did, it auto-generated beautiful documentation that I never had to worry about updating manually.
Payment processing: Stripe
This was by far the most complicated 3rd party integration that I had to deal with, and the source of many sleepless nights. I was extremely tempted to scrap the entire payment system from my site entirely, merely to avoid having to deal with these headaches. I ultimately came up with a good-enough solution, but the amount of UX choice and financial liabilities involved here are simply staggering.
To give you a quick rundown of the very major choices you have to make:
– Do you want to offload the account-creation process to Stripe, or manage it yourself?
– When facilitating transfers from senders to receivers, do you want to do the transfer directly? Or through yourself as the middleman?
– What do you do if the receiver doesn’t even have an account set up?
– What do you do if a customer cancels a charge, after you’ve already paid it out?
– Every time a customer disputes a charge, you get hit with a $15 fee, even if the charge was just for $1. How do you deal with this?
– How do you limit your liability from stolen credit cards?
– How do you ensure that your users are never made liable for any potentially malicious behavior on the part of other users?
– Stripe charges 30c + 3% transaction fees. For a $1 charge, this comes out to 33c. Do you really want 33% of your revenue going to Stripe fees?
The number of major decisions involved here, their technical complexity, their significant impact on your user-experience, and the financial costs and liabilities involved with these decisions, are simply mind boggling.
I investigated other alternatives, but all of them seemed to have the same problem. BrainTree seemed to be the best alternative, and on par with Stripe in most ways, but similarly suffering from the same problems. I investigated PayPal-Classic briefly as well, but tossed them out the window immediately because of their data lock-in policy which would handcuff you to them forever. Dealing with money is always hard, and it appears that using Stripe is no different.
I started off just wanting a simple way to send email to users, but as I started exploring various 3rd party services, I realized Mandrill offered a lot of services that I didn’t even think of, but really needed. Sub-group unsubscribe options, bounce/click rates, HTML formatting, the list goes on.
I considered a few other options as well, before setting on Mandrill. The other industry leaders in this space seemed to be AWS-SES and Sendgrid. Of the 3, SES is the most bare-bones and cheapest option, whereas Sendgrid offers the best user experience and fully featured product, but costs significantly more if you need to scale up your site. Mandrill seemed to be a happy medium between the 2. On par with Sendgrid in terms of features and user-experience, but better pricing options if you need to grow your service.
A couple of complaints: They don’t offer any Java-SDK, so I had to manually deal with their HTTP-REST API myself. Partly as a consequence of this, but also due to other complications, it was somewhat of a pain to set up. I had to spend quite a bit of time figuring out how to properly use their many features, and how to get everything set up and working. Lastly, their free account only allows sending 2,000 lifetime emails and ~25 emails per hour. I ended up exceeding this quota in my integration tests alone. Hence, I had to subscribe to their $10/month paid service, which seems somewhat expensive for a relatively simple service. (Update: Mandrill now offers a test-mode that comes with 1000 free emails everyday)
2018 Update: I now recommend Sendgrid instead, since the above Mandrill service has been shut down.
Performance Monitoring: New Relic
Towards the end of the project, when I found myself with some free time to spare, I started asking myself how I could monitor my server’s performance, in order to determine when it’s necessary to scale things out. Both Heroku and RDS produce logs that track your transaction latencies, but I had no interest in manually watching these logs in order to detect problematic spikes. Some quick googling revealed another Heroku Addon addressing this problem: New Relic.
Let me start out with the bad: Their heroku-addon documentation is a little disorganized, which made it somewhat clunky to set up. Besides that, I love almost everything else about this service. Their cheapest plan is completely free, and gives you some amazing functionality. It monitors every single transaction that hits your API, tracks its latency, and even breaks it down into Heroku queueing delays, JVM latencies, Database latencies, and external-service latencies. Seeing these numbers actually charted out gives an immensely valuable perspective in recognizing which portions atually need optimizing, and which don’t. On the alerts side, you can configure it to alert you if the transaction latencies increase beyond a specified limit, or if the server stops responding entirely to regular pings.
Unfortunately they don’t support SNS alerts, which is the primary notification we use. However, they offer both email and PagerDuty integrations, which when used together with your PagerDuty/OpsGenie account, will ensure that you get timely notifications of system issues.
Lastly, their support is amazing. When I had trouble setting things up, the initial support rep I emailed was of limited help. After a few days of the issue not being resolved, someone else stepped in and suggested that we talk over phone and set up a screen-share. She was very flexible in working around my schedule, was knowledgeable on technical matters but still honest enough to admit that there were certain things she wasn’t sure about, and after doing some research on her side, was able to get back to me quickly with a working solution. Out of all the free services I’ve used, theirs was probably the most customer-friendly.
System Alerts: AWS SNS + OpsGenie
As part of my drive to build a high quality API, I wanted to be able to catch and fix any bug, issue or unexpected behavior that my users may encounter. However, as part of my zero-maintenance mantra, I didn’t want to do any manual work when it comes to examining my log files periodically for errors and warnings. Hence, one of the first things I did was to integrate a alert notification system into the guts of my application itself.
I didn’t do too much research into this, but Amazon’s SNS seemed like a great service to accomplish this goal. Their publisher-subscriber model allows you to create multiple publisher channels, and configure each with a unique collection of subscribers. A subscriber can be an email address, cellphone number (for text-messages), the SNS app, or even a REST route. For example, you can create a channel for warnings, and configure it with your email address as the subscriber. You could then create a second channel for fatals, and configure it with both your email address and cell phone number as subscribers. This way, any warning event will result in emails, whereas fatals will result in both emails and text messages.
Unlike performance monitoring though, your app itself is responsible for publishing events to the SNS channels, and generating alerts. I accomplished this by using a top level wrapper on my REST API, an ExceptionMapper provider, which catches every single exception/error thrown by my application. If it’s a WebApplicationException (ie, an exception specifically thrown due to illegal user input), it logs a warning and returns the specified response to the user. For all other exceptions and errors, the mapper converts it into a 500 server error, and publishes a SNS fatal event. This ensures that if an unexpected exception is ever thrown by the application, I get notified, and the user is given a graceful generic error response. On a more fine grain level, I also published SNS events in various portions of my code, so as to track various events that may not quite be errors, but are suspicious or worthy of human examination (for example, failed logins, invalid tokens, content-not-found errors, etc).
Overall, I’ve been very pleased with the SNS service and integration. The text messaging is expensive enough that you don’t want to get spammed by thousands of them, but the emails are so cheap that you can afford to generate any number of them. This allows you to lazily monitor the events that really matter on your servers, and respond quickly to issues and bugs, without having to get your hands dirty with server log files.
The biggest downside with SNS unfortunately, is their lack of support for phone calls or other intrusive alerts. Ideally, I would like errors to text me, and fatals to call my phone and wake me up in the middle of the night. In order to achieve this functionality, I had to use OpsGenie in parallel with SNS. When you create a OpsGenie account, you’re assigned a OpsGenie email address, and any emails sent to this address will automatically trigger phone calls and/or mobile-app alerts. To get started, I simply configured New Relic to email my OpsGenie account when they noticed problems, and also subscribed the OpsGenie email address to my SNS-Fatal topic. This way, whenever an urgent issue comes up, I get immediate notifications which can wake me up, even in the middle of the night.
If you’re wondering why I chose OpsGenie over PagerDuty: PagerDuty is indeed a little more polished between the 2, and OpsGenie doesn’t have the prettiest GUI. However, PagerDuty’s cheapest plan costs $9/month. In contrast, if all you want is mobile-app-alerts, OpsGenie offers that for completely free. And with regards to fulfilling its core functionality (alerting me and waking me up when things break), it’s just as good as PagerDuty. Overall, I’ve been very pleased with the OpsGenie service.
SQL Tooling: JOOQ
Because so much of the backend’s functionality revolves around interactions with the database, I had to decide early on the methodology I was going to use for executing SQL queries. This mainly comes down to 3 questions:
– How do you parse user inputs
– How do you construct the SQL query itself
– How do you integrate your table names and column names into the SQL query
Hopefully, the answer to the 1st question is already clear to you. Use prepared statements! Virtually every single SQL injection attack comes down to one cause: Not using prepared statements. By using prepared statements, you isolate the user-data completely from the SQL query compilation. This ensures that no matter what user input you receive, the query itself won’t change. A select statement won’t suddenly cause a table to get dropped.
With that resolved, let’s approach the 2nd question. The “simplest” way to construct a SQL query is using Strings. Looking back, this is what most of my past teams have done as well. However, it’s also the least functional and most error prone. You’re responsible for the syntax of every query you build, and any errors on your part will only be caught during run-time. You also lose a lot of functionality, in terms of constructing a query using OOP approaches.
I started searching online for a library that can help you construct SQL queries, and was not disappointed. JOOQ, an open source library/tool, enables construction of SQL queries programmatically. The syntax generation is automatically done for you, so you don’t have to deal with hacking away on Strings. You can even construct a single query using multiple components, each of which can be generated individually. To illustrate, just compare the following:
Database.query(“SELECT * FROM ” + table + ” WHERE ” column + ” = ‘” + value + “‘”);
Which brings us to the last question: How do you integrate your custom tableName and columnName into the above query? Again, the “simplest” way is to use strings to represent all tables and columns. So now, if you ever decide to change a column name, you’ll need to hunt for every use of that column name, and if you miss one, your code is going to blow up.
We could go one step further, and create constants to represent every table-name and column-name. This certainly makes your code more DRY, but every time you change your database schema or names, you need to go update your constants as well. Not as bad as before, but it’s still tedious work, and if you ever forget, your code is going to break during run-time.
Which brings us to the best solution: Use JOOQ code-gen to automatically generate all of your constants for you! JOOQ’s code-gen has the incredibly powerful ability to connect to your database, extract all your tables and their schema, and automatically generate constants representing them. Setting it up was a little complicated and took some effort, but once I did, it was a thing of beauty. I stuck it in my integration tests, and it automatically ensures that my constants are perfectly in sync with the database.
It even came with an additional killer feature: compile time checking and type safety. JOOQ code-gen doesn’t just generate String constants. It generates entire classes. It tracks every single column present in every single table, along with that column’s data type. This ensures that you can programmatically construct queries that have valid syntax for the specific data-type associated with a column. This also ensures that when you execute a query, you don’t just get a “blob” of results; you get an output class that is type-safe according to the column you queried for. All this helps immensely in programmatically generating queries and parsing their output, in a type safe manner. Catching errors at compile-time is immensely preferable to catching them during run-time, and JOOQ makes this possible even when dealing with database interfaces.
The support that Lukas puts into JOOQ is immense, and the library is a joy to work with. Deciding to use JOOQ early on was one of the best project decisions I’ve made.
Front End Development: Freelancer
Hence, why I hired a freelance front-end developer, to take on this project. I’ll write another blog post at some point in the future, detailing my experiences with hiring/managing freelancers, but overall, it’s been a great experience. He was very communicative, came up with a number of ideas on his own, was receptive to feedback, and understood the latest technologies well. It was a pleasure working with him and through his efforts, we were able to produce a front-end client that lived up to my idealistic vision and features. Most importantly, outsourcing this project allowed me to focus on building and polishing other vital aspects of the site. To give an order-of-magnitude estimate of the costs, it was in the $1,000 – $10,000 range.
Some quick words about the technologies underlying this site’s front-end: It’s a Single-Page-App, built using React, making heavy use of AJAX and infinite scroll, and deployed using AWS Cloudfront.
Graphical Design: Upwork
I’ve always been optimistic about my ability to pick up and learn new things, but one area I would not touch with a 10 foot pole is Graphical Design. Given my lack of artistic talent, the last thing I wanted to do was scare away any and all visitors. Hence why I hired Yuriy, also a freelancer whom I met through UpWork.
In order to communicate my basic site vision, and user flow, I drew a series of sketches, illustrating the various content and buttons that comprise each page, and how the user would navigate the site, both within each page and across multiple pages. I also shortlisted a couple of sites that I thought best illustrated the minimalist, elegant, and pleasing design style that I admired and wanted to emulate with Caucus. Yuriy was then able to convert my stick-figure drawings, and vague style wishes, into beautiful PSDs, with coherent color schemes, and all details fleshed out in an elegant manner. In a couple of instances, he even suggested changes that I hadn’t thought of, and are now an integral part of the site and user experience.
It took a couple of iterations and feedback cycles, but Yuriy was a pleasure to work with and we were soon able to get to a great design. To give an order-of-magnitude estimate of the costs, it was in the $100 – $1,000 range.
The above might sound like one long laundry list, but there were still a couple of other neat tools that I would like to explore further, but didn’t quite get the chance to.
Automated heroku dyno scaling: Adept addon
In theory it sounds great. As part of my scale-infinitely and automate-everything mantra, I would definitely love to have a tool that can scale my Heroku dynos automatically based on traffic. The last thing I want is for a surge of users to crash the site, or for me to be 24/7 on call when it comes to scaling up the servers to handle traffic fluctuations. Based on all research that I did, Adept seems to be the best solution for solving this problem. Their pricing starts off at $18/month, but if you’re operating multiple dynos, Adept’s ability to scale down dynos when not needed can actually pay for itself.
I’d love to get my hands on it except for on thing: I’m currently on a Heroku hobby dyno ($7/dyno vs $25/dyno for exactly the same hardware), and while you’re on the hobby plan, you’re not allowed to scale out at all. This implies that I can either pay $7/month for a non-scaled hobby-dyno, or pay $43/month for a single dyno that is capable of scaling. As cool as this feature sounds, that price jump is simply too steep for me, on a hobbyist budget.
Hence why I decided to settle for New-Relic/OpsGenie alerts that would alert me of traffic spikes, allowing me to then respond manually. But in the future, if traffic grows to the point where additional dynos are needed, I will certainly be installing this addon immediately.
Code Quality Analysis
I’ve always been a huge fan of code analysis tools that can help point out badly designed portions of your codebase. When working with freelancers whom you don’t know intimately, this can also be a powerful tool in ensuring that they are writing clean code that is easy to maintain in the future. I did use IntelliJ’s built-in code-analysis tools a couple of times, but for the most part, this has never been something that I prioritized highly. But if I had to do the project all over again, it’s certainly something I would integrate more deeply into my workflow.
And there you have it. The entire tech stack that powers this website and its API. My gratitude goes to all companies/organizations listed above who provide great services at very reasonable prices. If I had to reinvent the wheel myself on each of the above, I certainly would not have made it this far. If you have any thoughts or other suggestions, I’d love to get your feedback in the comments below!