What are some common misconceptions that people have about storing data? originally appeared on Quora, the place to gain and share knowledge, empowering people to learn from others and better understand the world. You can follow Quora on Twitter, Facebook, and Google Plus.
Many IT organizations mix up data storage with data preparation. Organizations should capture and store all the data they can, given cost constraints, “as is”. Many organizations make the mistake of early binding their data. That means that they are transforming (ETL) or aggregating data too early in its life cycle, losing the ability to work with atomic data when needed.
Rather, organizations should store the data “as is” in an inexpensive form (like S3) and then add structure to the raw data later according to the needs of the business. This makes the organization responsive and agile to the needs of the business without losing data fidelity that may be needed later down the road.
In recent years, there have been some major advancements in data platforms that make this late binding concept, or “schema on read” possible. First, the advent of Hadoop and the data lake introduced the concept of complex data types, allowing the use of JSON and nested fields natively in queries without prior transformation. More recently, the new cloud data warehouses like Snowflake and Google BigQuery also include these capabilities in the SQL data engine itself. Coupled with a managed service, more organizations can adopt this new architecture without having to invent or engineer these solutions themselves.
As organizations modernize their analytics infrastructure, it’s crucial that they avoid dragging along the old, tired and complex data modeling habits of the prior generation. Rather than modeling data with star schemas, fact tables and dimension tables, organizations should look to simplify their data models by avoiding early transformation and aggregation. By doing so, organizations can quick respond to business needs and enterprises can spend more time analyzing data rather than moving and transforming it.
Photo Credit: baranozdemir/Getty Images