Primer on NewSQL

Covers what NewSQL is and when to use it

October 05, 2018

Intro

Note	This article goes well with this primer on databases, so you may want to read that one first.

If the thing most missing from your life is another type of SQL, then you’ve come to right article! I speak, of course, of NewSQL.

WTF is NewSQL?

You get NewSQL when you take NoSQL (or a plain KV store) and put SQL on top to form a sort of unholy sandwich.

Obviously.

A Real Explanation

Jokes aside, NewSQL attempts to solve the scalability problems of pure SQL DBs by taking a KV store (key-value) and layering SQL semantics on top of it. In theory, this gives you the ability to scale horizontally without multiple blood sacrifices, while also letting you use all (or at least most of) the fancy SQL query statements you love.

An Example

If we were to store a customer in MongoDB, we’d probably do it as a JSON blob that looks something like:

{
  "id": "123456845",
  "name": {
    "first": "Fletcher",
    "last": "Haynes"
  }
}

In Postgres or MySQL, we’d probably have a table that looked like this:

Table 1. Table Customers
id	first_name	last_name
123456845	Fletcher	Haynes

Now, let’s say we have 1 MILLION customers. Find me all the customers with the last name of Haynes.

In the MongoDB world, we get to do:

for eachCustomer in customers:
  if eachCustomer["last_name"] == "Haynes":
    # whatever

In SQL, we’d do something like this:

SELECT id, first_name, last_name FROM customers WHERE last_name = 'Haynes';

Both of these are going to be slow. Why? Because they require examining each record and checking a field. In Postgres, we could add an index to the last_name column, and get the results much faster. Even in some NoSQL DBs (MongoDB now included), you can index on fields. Before that, your logic had to be in your application code, and for some NoSQL DBs it still does.

As data grows (it always does) and gets more complicated (it always does) and becomes more interconnected, i.e., relational (it always does), you start needing to do JOINs, etc. NoSQL DBs tend to start sweating at this point. They can’t keep up with the query load, you have duplicate data everywhere, every entry has different data because it turns out that whole schemaless thing was a lie, and so on.

Of course, on the SQL side, once you need two masters that can accept writes, you have to start digging into manpages bound in human skin and saying things like "OK, we’re going to need a rake with one broken tine, a spark plug, and two rolls of duct tape (not the cheap shiny kind, the dull silver kind), and we’ll meet up at midnight. GMT of course."

Introducing the New Kids on the Block

Like the rest of technology, the solution is to just combine old stuff in new and exciting ways. The top contenders (that I know of), are:

CockroachDB - They are my favorite, and work very well.
TiDB - A solid project using Rust and Go. It is in production in multi-PB systems.
ActorDB - I’ve not used this one, but heard good things
RethinkDB - This is an older NoSQL option that lost out to MongoDB years ago, even though it was (and is) a better product in the technical sense. I include it here because they have added on SQL-esque things.

Note	And, of course, the NoSQL DBs are adding SQL features. MongoDB can do indexes and such. And Postgres can now do most NoSQL-y things with JSON. Really, all databases are just coalescing into one giant ball at this point.

The Price

There is, of course, a price for SQL semantics and easy clustering. Latency.

Let’s take CockroachDB and say we’ve built one server in the US, one in EU, and one in AP. We want to do an insert. That insert has to be sent all over the world. Now, most of those systems have knobs you can turn. For example, you can say you only want an acknowledgement from the US-based node. For the most part, these systems work well and strike a good balance between the two worlds. A certain amount of care and feeding is required.

End

So when should you try a NewSQL solution?

You have a team with little experience with clustering and scaling Postgres
You’ve outgrown Postgres
You need master-master topologies that span wide geographic distances

That’s all for this article! But don’t worry, there’s a lot more to DBs we haven’t covered! I’m trying to keep these articles short and digestible.

If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.