One of the many things we love about the software world is how frequently new frameworks, languages, or tools will swoop into the scene and offer up new approaches to solving the problems we deal with every day. We get excited about new methods that challenge the old ways of doing things to help us do better work, faster. While it can be easy to get caught up in the hype, there are some paradigms that have impressively withstood the test of time and quite frankly, have done so for good reason.

Take databases for example. Since the early ’80s, relational databases have been the standard for structuring and connecting extensive sets of data. However, in the last decade, non-relational databases have gained avid support from the community, causing developers and data architects alike to reconsider using the former tried-and-true convention. At present, these two database types are vastly different and selecting one or the other will surely have significant impacts, for better or worse, on the applications you develop.  

Welcome to Part I of the two-part series, The Database Dilemma. In this article, we will give a brief overview for those who are “new” to the world of databases, distinguishing structured data from unstructured data and relational databases from non-relational. In Part II, we will take a deeper dive, leveraging some of our own experience working with the two paradigms. We will also give a few tips to consider when choosing the appropriate database type for your project. 

Structured vs Unstructured Data

To understand the trade-offs that come with either data paradigm, it’s important to recognize the types of data they appeal to: relational databases are great for structured data while non-relational databases tend to be better for semi/unstructured data. But what’s the difference between structured and unstructured data? 

Structured data is sometimes referred to as quantifiable data, while others will categorize it as being easily recognized and understood by computer programming languages. When we are working with structured data, we have a general sense of the nature and format of the incoming data and can thereby establish validations to keep data entries tidy, accurate, and organized with high confidence. For example, order information collected at checkout on an ecommerce website is an example of structured data. No matter the order, we can assume that a customer name, billing address, shipping address, and payment information will be retrieved and collected in a consistent, recognizable format. 

Unstructured data can be described as qualitative data, examples of which include, but are not limited to, documents, images, videos, server logs, and audio files. Data coming from multiple sources can in some cases be considered unstructured. This could be the result of each source delivering information in a unique, inconsistent format. Unstructured data tends to be less predictable than structured data, making it flexible but also difficult to impose validations on. 

Relational vs Non-Relational Databases

If you’ve ever entered information into tables on a spreadsheet, you are already somewhat familiar with how relational databases organize data. In a relational database, information is stored in columns and rows. Identifiers, referred to as primary and foreign keys, are used to establish connections between associated data housed in different tables. Prioritizing rigid consistency, relational database transactions are facilitated through SQL, the universal relational database query language, to ensure the integrity, accuracy, and durability of database transactions. This adherence to consistency and order is what makes relational databases the ideal architecture for structured data.

Non-relational (or NoSQL) databases, often referred to as document-based storage, are less like a spreadsheet and more like a file cabinet that organizes files into folders. They offer complete and total flexibility through a schemaless format, making them a godsend for unstructured or semi-structured data. At the peak of their hype, NoSQL databases boasted an inexhaustible ability to scale both horizontally and vertically, distributing computing power based on an application’s needs, a barrier that relational databases had yet to overcome. This concept of plasticity was traded for definite accuracy. The impact of this trade-off could be argued one way or the other; when dealing with gargantuan-sized data stores, the benefit of accelerated query performance could potentially outweigh an occasional transaction mistake.   

The Line is Becoming Blurred

Programmers and software engineers are always looking for opportunities to innovate, repurposing or borrowing from what’s available rather than wasting precious development time trying to “reinvent the wheel”. Over time, deciding between relational and non-relational databases might not be as difficult a choice as the overlap between the two becomes greater and there are more scenarios where either paradigm will work. For example, some non-relational databases will now allow you to employ foreign keys and indexes to sort and correlate sets of data in a way that resembles a relational database. The same goes for relational databases: by providing a JSONB field, some relational databases can offer certain aspects of the flexibility afforded by non-relational databases. Nonetheless, the current divide between the two strategies is still distinct enough that developers must consider the ramifications that will certainly follow using one paradigm in place of the other. 

In Short

Relational databases are like your Type-A personality friend: rigid and definite with minimal wiggle room. They stick to “the plan” to ensure the accuracy and credibility of database transactions. In contrast, non-relational databases are the ones with the “happy go lucky” attitude; their “go with the flow mentality” allows for flexibility when data is unstructured or semi-structured. Eventually, the two will likely inherit the benefits of the other but for the time being, deciding on one strategy or the other can have a substantial influence on the performance and maintainability of your application. For a more in-depth, comparative analysis of the two, stay tuned for Part II.