Building a Data Pipeline - Languages and Stack

Choosing a language and stack


Hello! Welcome to the exciting topic of ingesting massive amounts of pastries data! We talked about ingest a bit in the previous tutorial; in this one, we’ll examine various architectures in great detail.

Let’s get the controversial part out of the way. Language. =)

So You Want to Build a Language VM - Part 33 - Cluster Syncing

Cluster Syncing


I don’t know about you, but I’m getting tired of all this clustering. But, the end is in sight! For real this time! I promise!

WWhen we ended last tutorial, we had been able to send bincoded messages. A cluster member could join another cluster, and receive a list of other nodes back.

Our next task is for the new node to take each node it receives, and establish its own, independent TCP connection to them.

So You Want to Build a Language VM - Part 32 - More Clustering?!

More Clustering?!


Hello again everyone! In this tutorial, we’re going to continue to work on clustering. When we left off in the last tutorial, we had the joiner node sending a hello message and the server node adding it to its list. The next tasks are:

  1. Send a hello back

  2. Send a list of all known nodes to the new joiner == Full-mesh Network Remember how I mentioned we’d be doing a full mesh network? I realized an illustration might be handy, so more beautiful text art!

Building a Data Pipeline - Part 0

Covers general elements of data pipelines, what they do, and why they do it

Intro When I started this blog, I wanted it to show building multiple different types of projects from a practical perspective. As an astute reader, I’m sure you’ve noticed that Iridium has been the only the project we’ve been working on. Time to change that! Welcome to the first post of Project Grimwhisker! (adsbygoogle = window.adsbygoogle || []).push({}); What? It’s a data pipeline.

So You Want to Build a Language VM - Part 30 - Cleanup Time

We've got a lot of warnings and clippy things to fix!


As fun as it has been working the clustering, we’ve accumulated quite a bit of technical debt. There’s tons of warnings, and I haven’t dared to run clippy. So this post is all about going through and cleaning them up. =) It won’t be as exciting as adding features, but making sure to take time to do cleanups is just as important, if not more so. Tech debt has a way of growing faster than credit card debt.

Primer on Database Sharding

Covers what database sharding means and different implementations

Intro Another exciting database article! I bet you couldn’t wait! =) I’m going to keep this one short, and cover one topic: database sharding. What Is It? Let’s say you have an application that needs a backend database. You’re using Postgres, and you’ve scaled it as much as you can vertically. Like, you’re using AWS X1E 32xlarge that cost around $20,000 USD per month. What can you do next?

Primer on NewSQL

Covers what NewSQL is and when to use it

Intro Note This article goes well with this primer on databases, so you may want to read that one first. If the thing most missing from your life is another type of SQL, then you’ve come to right article! I speak, of course, of NewSQL. WTF is NewSQL? You get NewSQL when you take NoSQL (or a plain KV store) and put SQL on top to form a sort of unholy sandwich.