The great NoSQL vs SQL flamewars of a few years ago may have calmed down, but what did we learn?
Aside from the fact that arguing on the internet doesn’t get anyone anywhere, maybe not much at all. If anything, the market is even more confusing now than ever - just take a look at the 225+ entries on nosql-database.org.
And with new entrants like CockroachDB and Google Spanner coming onto the market, and traditional database vendors such as Microsoft and Oracle offering more and more NoSQL features, in 2017, are we any closer to knowing what’s the best way to store your data?
Everyone seems to get bogged down in whether they should be using SQL or NoSQL database technology. This is the wrong way to think about it. As we’ve said before, databases are tools and each excels at different tasks.
Instead of “NoSQL or SQL?” the question should be “what job do I need my database to do?”
Variation among tools
RDBMS and SQL databases excel as general purpose storage. As such, for 90% of the time they’ll be the right tool for the job. The 10% of the time you have specific needs, NoSQL databases can be better. For example, handling geophysical data, surveillance data, or search indexes. But, by their very nature, these use cases are few and far between.
Then among the broad groups of ‘SQL’ and ‘NoSQL’ there are different solutions.
There’s MySQL, PostgreSQL, MSSQL and various others for RDBMS.
What’s with all the animals? (Wikipedia Commons)
There’s RedisDB, MongoDB, Cassandra, CouchDB, CockroachDB and countless others for NoSQL. (Ken Alger wrote a really good post breaking down the different types of NoSQL database.)
Each one of these specific databases will do some things better than others, some things worse than others, some things not very well at all - but require little thought to use.
Again, it comes down to what you need your database for.
But what about scalability?
Ah. This is where discussions get heated and words fly like bullets. “SQL databases don’t scale!” Or so the cries go.
Try telling that to some of the biggest websites and web apps in the world that use SQL databases as a core part of their database stack. Are the technical teams behind Wikipedia, Facebook, Wix, YouTube and others just not as pro as the developers on Hacker News commenting it out over whether SQL databases scale?
No one said it was simple or straightforward to scale a RDBMS, just that it’s perfectly reasonable and possible to do. Scaling out and scaling up and sharding are the things you need to know about when it comes to scalability for SQL databases.
But, frankly, a majority of developers will never work on a scale where the issues of database scalability are something they need to think about. And if Facebook can use MySQL (albeit heavily modified) to handle pretty much every user interaction (likes, shares, status updates, alerts, requests), then you shouldn’t even be thinking about NoSQL if you do need to consider scalability (unless it’s for purely cerebral purposes).
It’s not “out-of-the-box easy” to scale traditional RDBMS, but it’s possible. Recently, especially in 2017, a slew of ‘New SQL’ databases have tried to make this a thing of the past. Google is trying to simplify the global scalability of RDBMS and SQL-esque databases with their Spanner Database. Even with all their time-stamped cleverness, who knows if it’ll work? And only very recently, CockroachDB announced version 1.0 - its production-ready “open source, cloud-native SQL database.” Cloud-native simply meaning software created specifically for and in the cloud rather than on-site.
Both of these focus on making the scalability of RDBMS and SQL-style databases easier for the age of distributed computing.
Databases: it’s all about the tradeoff
Like the broad spectrum of software development in general, choosing database technology means tradeoffs. For databases, it’s usually a toss up between performance and functionality.
Even choosing how to scale a SQL database comes with tradeoffs (do you want optimal write speed or read speed?). It’s not just a simple line between SQL and NoSQL.
Poki wrote a good piece about the tradeoffs they made opting to move from PHP to GO and sticking with MySQL instead of moving to NoSQL.
The NoSQL tradeoff
When it comes to NoSQL databases you sacrifice:
- Query capability
- In-built database schema and structure
- Simplified database design (read: not normalized and no/minimal schema)
- Rapid read performance
Admittedly, the one time it might be beneficial to start with a NoSQL database is when dealing with data not kept in a normalized data model. But even then, [SQL databases don’t need to be normalized] (http://highscalability.com/blog/2008/7/16/the-mother-of-all-database-normalization-debates-on-coding-h.html).
The SQL tradeoff
This doesn’t mean SQL is without its tradeoffs or problems.
It’s pricy to scale up (adding resources to a single database node) because proprietary hardware is expensive. However, this cost is somewhat reduced with the rise of cloud database storage as a service and cloud-native technology.
To scale out (add more nodes to a system, such as adding a new database server somewhere else in the world) is trickier. A good knowledge of database design and administration are fundamental here because the only solution is to shard your database.
A whole other kind of tradeoff
Sometimes the choice is about the tradeoffs between the ease of use, installation and planning.
Tradeoffs. The database (and software) world is full of them.
Really, you should use SQL
Data is doubling every two years. Big data is becoming more and more pervasive and doesn’t like rigid schema. But this is far from the daily norm and NoSQL technology is still very new, especially compared to battle-tested SQL and RDBMS. Given some time, NoSQL databases may develop and overcome their current limitations. But until then…
The best thing you can possibly do is start with a SQL solution.
Realistically, the only time you should use NoSQL is when you have no other choice. Like if your database is too big to fit on one server or you need a massive number of writes per second. Otherwise, SQL will give you better performance, much more security, and better data integrity. Using a NoSQL solution will just mean a loss of functionality.
If companies such as Pinterest and Facebook can accomplish flexible schema with fast reads using MySQL or PostgreSQL, simply through different data modeling, then SQL is almost certainly going to be fine for you. Heck, Wix even reckons that a MySQL database makes a better NoSQL database than a NoSQL database!
If you do ever find yourself considering NoSQL, ask yourself…
- What data do I need to manage?
- Why do I need to manage it?
Then Google your specific use case and read about how various (usually NoSQL) databases struggle to handle simple uses well. Most of the time you’ll find that a RDBMS is perfectly suitable for what you need to do. There’s plenty of debate and examples over the years to help you on your way.
Don’t just jump on the NoSQL or SQL bandwagon. This is not a war where you need to pick sides. Think about your very own specific use case and go from there. It’s perfectly fine to use NoSQL for one project and SQL for another - a good developer picks the tool that’s going to be best for the job. Just always keep in mind that you’ll probably end up at a SQL/RDBMS. And that’s OK.