The Premise.
Can you use Redis as your primary DB?
The Answer.
Yes - Obviously. Though eventually, someone will probably mount the push to remove it as the primary database from your stack, citing maintainability (with genuinely solid reasons.) Maybe this exploration can help head that off early on in your greenfield - or maybe push you back to something more streamlined.
Most, if not all comparisons/reference points will be made against Postgres. A lot of this information already exists today, but I think it might be helpful to just aggregate and write it in this specific lens instead.
Storage
Redis is persistentref, and can function roughly in the same way you would expect something like WAL. Two Redis-themed durability methods exist:
- RDB - A snapshot of the current DB, as it currently exists.
- AOF - A file that records every command written since the start of the file, and rebuilds the DB from the commands used to get to the most recent state.
Generally you’d want to use these in tandem, providing you two different layers of durability that you would get from a combined pg_output (RDB), and WAL Checkpoint (AOF) - when compared to Postgres.
The key premise here is that you want to balance how often Redis flushes its outstanding writes to disk for AOF usage, with how much of the performance hit you’re willing to take. For this reason, you would want to choose between settling writes every one second is tolerable (the default as of now) - or flushing every command to disk before it is executed.
... AOF ...
Command Issued -> Redis acts upon it and stores it to AOF -> fsync every second
... RDB ...
Backup Starts -> Redis Forks -> Snapshots the current state of the DB -> Writes to backup -> ... -> Restore Redis to that exact state.
So with that all said; AOF gives you (in default usage) data resilience up to the resolution of a second, and if that was to fail - you can fall back to the nearest cadence of your RDB!
Notes:
- Since Redis operates on its datastore in-memory, it should be important to notice that your largest database size possible, is capped by your available RAM. Not surprisingly, this observation also applies to any Read-Replicas you introduce making this costly at the extremes (If you’re using Redis Cluster - This changes this intuition quite a bit for the positive.)
- Mistakes from erroneous data creation that consume available storage, through uncapped code errors or abuse - can more readily be felt without proper key-name schemas.
- Logical Separation (Indexes) can be good to isolate/cordon off the damage here.
- At some point, you’ll need to consider that your AOF might just go bad. This honestly can be super rare, and you’ll probably never deal with it. To get ahead though, it’s best to keep your RDB cadence as your RTO - and keep it as small as you can (hourly can be appropriate to start!)
Data Design
This isn’t really so much about the design of the data structures themselves, but rather how i’ve noticed how these things naturally became used with meta-data-structures built around a K/V access pattern.
The first hurdle you’ll probably face is realizing that our Redis data query is by considering the:
- Selected Index (Configured at the client constructor, of the client in the relevant language.)
- Then by the data structure accessor method (For example,
Getwon’t work on aHash, andHGETwon’t work on aStringkey.) - Followed by the key name itself.
- Finally, where appropriate - the data structure’s subkey or subvalue (e.g. a Hash Key)
This can wildly differ from the idea of: Select Schema, Select Table, Perform Composite Query to select Rows. Redis (notwithstanding RedisSearch) doesn’t give you secondary indexes so things get tight.
Here are some approaches that model your data structure for Redis specifically.
Flatbed: How I describe irregular items with shallow-depth (often zero-depth.)
Approach 1: Flatbed Key Space
Generally this is the most common design i’ve ended up using in Production when we needed to move fast. It asserts:
- Tables are represented by the key’s key-space (
KS) they are within.<key_space>:<table>:<row_index, often the pseudo id column>:<.. nth id>- Example:
orgs:memberships:abc123:xxxyyyyzzz - Where
orgs→ The key space.memberships→ The “table” relating to memberships.abc123→ The “primary ID column”, in this case, the Organization ID.xxxyyyyzzz→ The optional “second/third/n(th) ID column”. It might not be relevant in this example, but can be used to illustrate how you would recognize access to this resource is by composite in expected usage, and you can go from here.
- With schema visibility reduced here, be careful of key names. Mistakes in code can lead to duplicate keys being made, or key names being incorrectly saved/fetched.
All keys, across all KS’ and “tables” would exist together, floating in a flat-surface level way, with different data structures per KS. Generally from here, you would often decide ahead of time what data structure would be most appropriate for the purpose. In our membership example above, you would find that either:
Setswould be nominal for a list of IDs who had access to this Org.Hashes (Hash)would be nominal to provide theValueof key sub-key to be assigned as a store for things like Permissions, Ownership, or a JSON Object (the latter being without atomic guarantees in this case unfortunately.)Stringwould be nominal if you wanted to work through the Atomic JSON commands available.
Downsides here often would be comparable to the same usage of MongoDB (with regards to the Document paradigm, leading to libraries like MongoDB-specific Mongoose to be quite helpful) when performing migrations or needing to reshape data at scale. This is often the strongest argument for retaining a strict schema DB (like Postgres) and using a read-through-cache approach (explained later.) You also (as explained in the next approach) suffer from data duplication and disagreement without proper diligence (and even then.)
As a personal nit, i’ve found it easier in the developer lifecycle to either use :: or : instead of: -, _, ~, myKeyName_Space:abc to separate the components of the string. People don’t make as many mistakes, and we preserve intention-based hierarchy (although just as a symbol rather than any specific technical validation unless present) a lot more cleanly.
Approach 2: Flatbed Key Values
There are two approaches here technically, with more opinionated takes on Approach 1. One is an assertion that we consider all keys to be Hashes and adopt the Document model (capped at the root-depth), while the second asserts that all keys are String and we use the native JSON Redis commands to operate upon them with Atomic Guarantees (Hashes also have this guarantee naturally.)
Hashes and Documents
We’re asserting that we’re following a Document model comparable to MongoDB here.
Reusing the key-based approach, we have to start considering how our data is going to be shaped. For instance, when we re-approach our Membership example: which entity (the organization, or the member) has ownership over the membership list? When a member supplies their credentials to an API call, and we want to reply with which organizations they’re a member of - how do we identify which organizations they are a part of without prior knowledge of that user in this stateless request?
Q: Why Hashes instead of Sets for things like Memberships?
A: I’ve always found that eventually we need to be able to store either JSON data, or something more about the user (RBAC, Org Permissions) and having a value portion to each field seems to always just become needed eventually. Hashes don’t have duplicate Fields so you’re not in danger (as expected, it’s a hash!) of concerns like that.
Even in normal data model designs, this does produce a challenge - which is often solved by an intermediary table that records the link between an Organization and a Member.
To solve for this in Redis, you generally would need to pick one of:
- Have a Hash Key for Memberships, and a Hash key for per-Member memberships (as composite queries don’t necessarily exist for this purpose in Redis).
- Data drift can be common here, and a regular source of on-going developer pain to correct for post-incidents/issues.
- The major upside for this is a near-O(1) average time (yes,
HGETALLis O(n) at the worst - but we don’t expect these Hashes to grow large for this use case at the least.) - More intelligent downstream data design surrounding access control (API Keys bound to an organization and user, JWT’s etc) can remove the need for a per-member, though this still can be tricky for Org-Switching as a pattern example.
- Avail of
SCAN-based approaches where you would employ the Key Space paradigm, scanning with a wild card of the component of the Key Name that contains the user’s ID, like:orgs:memberships:*:<the user's id>- You can’t expect greater than O(n) here, but you could argue the speed of Redis would “make up for this”. Though having to iteratively scan on each API call to derive membership RBAC is rather ludicrous, and i’ve seen people run into this blindly before.
- This is probably the firmest example of a bad data access pattern. The example in this case assigns one single user ID’s membership to one org, referenced by the key name. A less evil example here is:
user:memberships:<user_id>containing all orgs the user is a part of.
If we availed of SCAN, some hypothetical performance expectations (as informed by the Redis benchmarks) would look like:
Asserting that we have (100k, 1m, and 20m keys) in the format as above. We’re using a machine from the c7i generation of AWS. We’re processing 10RPS, and 100RPS, on both one master, and or three replicas (each respectively in this matrix.) We want to search for an arbitrary
| RPS | Replicas | Total Keys in Redis | Request Latency (Avg) | Request Latency (Worst) |
|---|---|---|---|---|
| 10 | None (Master Only) | 100k / 1m / 20m | 100k: 5 ms | 1m: 100 ms | 20m: unbounded | 100k: 10 ms | 1m: 200 ms | 20m: timeout-bound |
| 100 | None (Master Only) | 100k / 1m / 20m | 100k: 10 ms | 1m: unbounded | 20m: unbounded | 100k: 20 ms | 1m: timeout-bound | 20m: timeout-bound |
| 10 | 3 (Read Only) | 100k / 1m / 20m | 100k: 5 ms | 1m: 60 ms | 20m: unbounded | 100k: 10 ms | 1m: 120 ms | 20m: timeout-bound |
| 100 | 3 (Read Only) | 100k / 1m / 20m | 100k: 6 ms | 1m: unbounded | 20m: unbounded | 100k: 12 ms | 1m: timeout-bound | 20m: timeout-bound |
Unbounded - Too hard to measure for quick estimates that have too many dependent variables tbh!
Timeout-Bound - The upper-ceiling comes down to the underlying network timeout, or other timeouts involved.
If you wanted to assert that Fields use a native-redis data structure to further avail of Hash-Specific commands, then that would be another thing to weigh up over the next sub-approach.
JSON and Strings
A relatively newer approach that requires that Redis has a first-party (RedisJSON) plugin installed, is probably the most interesting of all - though still held back by mostly the reasons we’ve touched on previously for Hashes and Documents. The reason you’d use this approach is for a richer data structure with depth when compared to Hashes. Hashes are more performant though due to their internalized representation, compared to JSON - so this is something to consider.
Generally you’d want to assert that we’re using the dual-copy approach which while memory-wasteful, keeps our reads very performant. With this said, you would be able to avoid the annoyingly dangerous pattern of:
Read -> Deserialize -> Modify -> Serialize -> Write (Potentially Stale Write)
By having the DB itself atomically modify the JSON document.
Approach 3: RedisSearch
For roundness of this exploration, and without going into detail - RedisSearch can in a lot of cases correct for the blindspots created by a pure KV approach. Though, as this is not installed as a default plugin (although first-party) and basically is rich drop-in for this problem - I don’t think it’s worth reprinting everything from their documentation here. Go look!
The major advantage here is that your ability to perform more ad hoc queries, either user-borne, or product-development-borne.
tl;dr - A first-party upgrade that lets Redis act like a Document DB natively. It’s more what I’d call “Neo-Redis”, so if you’re looking for bleeding edge then it’s a production-ready shot!
Approach 4: Read Through Cache
Not mentioning the most common way i’ve found to be using Redis in a more active way, would be a lost opportunity! I am cheating though, by involving Postgres directly, so i’ve left this until last. This is the primary way I currently use Redis in production. It’s to no surprise, a variation of Approach 1!
The general idea here is that: What if we just amortized the cost of one Postgres Query, and proactively used it to feed front-line data structures? Basically, we treat Redis as a first-class database, and not just a 1:1 JSON dump of any particular Postgres Row (but critically: backed by postgres)
In our true world case, we simply invalidate any key through a DB hook for specific tables that are in our hot path (we settle cache drop requests within 90ms from DB to Redis usually), and have logic flows that fetch and then store the pre-computed / pre-filtered data. Though, this should be used sparingly and for the right reasons.
Now there are more moving parts here, and caching is hard. Really hard. But if you can more proactively give yourself data hints ahead of time, it can be really interesting.
One example implementation here that reduces the hypothetical middleware lookup (in the example of asserting what API Key or User is in the stateless request) to about 10-20ms per request goes like:
First Set of Queries
Key 1: api_keys:key_to_org:<first character of the key>
- A hash key, that asserts the api key to a specific org.
- Stores the value as
<org_id>:<user_id>
- On no match, either decide one of:
- Do we want to keep this data store authoritative (as opposed to the keys in the second query?) And consider missing items here an application/logic bug?
- Expense query requests on the postgres DB to re-assert that the user doesn’t exist.
- Rate limiting here is not a concern of Redis, but of upstream WAF. So we don’t really consider enumeration abusively to be a failure mode here.
Second Set of Queries (In Parallel)
Key 2: org:settings:<org_id>
- A JSON cached view of the corresponding postgres row.
- On no match, assert that we need to check the DB.
Key 3: orgs:memberships:<org_id>
- A hash key storing the user ID and JSON data of that user.
- In our true world, this is an example of amortizing the cost of searching for every member belonging to an organization. We use this in combination with
HEXPIREand DB-Based hooks to simply drop the subkey, forcing a DB read. Finding a membership entry here becomes O(1) versus variability in scanning within a DB (though a properly indexed + maintained DB might argue with this point.) - You can also store this as a String JSON key, and parse it per-request for varying reasons.
Key 4: users:settings:<user_id>
- A raw JSON object. Generally this is 1:1, alike Key 2.
- On no match, assert that we need to check the DB.
With this all said, we’ve shown how a Cache doesn’t necessarily just have to be 1:1 representation of a DB row. A nice place that Redis shines is storing (ironically in the theme of this post) the results of composite queries, or post-filtered results - in either raw form, or native Redis data structures. The foot gun that can arrive from this approach appears as not having a strong way to decide when to invalidate any particular key or sub key.
It is logically sounder to drop the cache key for regeneration, only once the DB updates (near/in/around a transaction code block for example) - but you may have other consumers of the DB (think Retool) that could modify the DB through other means. While this is rather horrible practice (though quite common) this can be one reason to decide upon DB Hooks as opposed to application lifecycle management for cache invalidation on this approach.
Q: How would you correct for a dropped/missed DB Hook?
A: This is largely the biggest problem with this approach. A mixture of eventual-expiry, and increasing the durability and settling of your DB-hooks (Queue/Temporal) is often the solution to reduce this. More discussion is always great here though!
Migrations
As inevitable as taking off your shoes at some random SF house party - comes Migrations!
The primary problem we’re dealing with is that our schema is not shared across all “quasi-rows/keys”. They’re all individual keys, holding their own field names, and structures. Similar to what you would experience in MongoDB, you’re ultimately going to need to consider the path of how you want to approach migrations. One possible way that gives you some guarantees is Append-Only.
Append-Only- You only add new Fields (never modify or drop old ones), and migrate old values to new Fields - online at runtime, or in one (big) lua-based transaction.Online- Take the example of adding a new fieldis_admin.- If you’re using Approach 4 (or similar) - It’s easiest to add the new schema to Postgres, and then mass delete keys in that key space to force re-reads.
- If you’re using more of an Approach 1 or 2 - It’s more involved. During a read, you could insert a shim to check for if the field needs to be added. Remove it after a year and cross your fingers (or leave it there!) These patterns tend to build up quite a lot of clutter in code after a while.
Lua- Generally safer, as long as you have not incurred too much drift. If there are more than a handful of cases, it’s best to stick toOnlineso you can model these easier. That said, this can give you (provided it is constructed correctly) more peace of mind to ensure existing users who may not log in for a few months can be correctly updated.
This really and ultimately comes down to the underlying data structures used per class of migration you’re trying to do. The theme that’s becoming more evident is that you need to put more thought into your data design prior to building, versus having an easier time in Postgres moving from representation A → B. Also evident is that it should be part of your thinking to consider how malleable the data structure you choose is to change, and at scale.
The big question!
I’ve for sure left out some things i’ve wanted to say - but I do feel like i’ve tried to add more interpretation than just reprinting the Redis Docs alone, to ground it more in the “how could I do this in reality.”
Now would I want to use this in production for this use case (as a primary database?)
Unless it was Approach 4, no.
After a year, or less - you probably would be in a mess.
Okay that’s everything, thank you!