Data Warehouse Modeling: Star Schema vs. Snowflake Schema
In computing, the star schema is the simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts. The star schema consists of one or more fact tables referencing any number Typically these relationships are simplified in star schema to conform to the. An organization may be at one place or may have several branches. It is called a star schema because the entity-relationship diagram. Example of Converting ERD to Star Schema. Step 1. Separate the master ERD into separate business processes or units. Below is the ERD for the Contact.
The star schema and the snowflake schema are ways to organize data marts or entire data warehouses using relational databases. Both of them use dimension tables to describe data aggregated in a fact table.
Everyone sells something, be it knowledge, a product, or a service. Storing this information, either in an operational system or in a reporting system, is also a need.
So we can expect to find some type of sales model inside the data warehouse of nearly every company. The Star Schema The most obvious characteristic of the star schema is that dimension tables are not normalized.
The light blue tables are dimension tables. We decided to use these five dimensions because we need to create reports using them as parameters.
The granulation inside each dimension is also determined by our reporting needs. The Snowflake Schema This snowflake schema stores exactly the same data as the star schema. The fact table has the same dimensions as it does in the star schema example. The most important difference is that the dimension tables in the snowflake schema are normalized. Interestingly, the process of normalizing dimension tables is called snowflaking. Once again, visually the snowflake schema reminds us of its namesake, with several layers of dimension tables creating an irregular snowflake-like shape.
Normalization As mentioned, normalization is a key difference between star and snowflake schemas. Regarding this, there are a couple of things to know: Snowflake schemas will use less space to store dimension tables. This is because as a rule any normalized database produces far fewer redundant records.
Denormalized data models increase the chances of data integrity problems. These issues will complicate future modifications and maintenance as well. To experienced data modelers, the snowflake schema seems more logically organized than the star schema. This is my personal opinion, not a hard fact. Query Complexity In our first two articles, we demonstrated a query that could be used on the sales model to get the quantity of all phone-type products sold in Berlin stores in The star schema query looks like this: Because the dimension tables are normalized, we need to dig deeper to get the name of the product type and the city.
Star schema - Wikipedia
We have to add another JOIN for every new level inside the same dimension. In the star schema, we only join the fact table with those dimension tables we need. Joining two tables takes time because the DMBS takes longer to process the request. Model[ edit ] The star schema separates business process data into facts, which hold the measurable, quantitative data about a business, and dimensions which are descriptive attributes related to fact data.
Examples of fact data include sales price, sale quantity, and time, distance, speed and weight measurements. Related dimension attribute examples include product models, product colors, product sizes, geographic locations, and salesperson names. A star schema that has many dimensions is sometimes called a centipede schema. Fact tables[ edit ] Fact tables record measurements or metrics for a specific event. Fact tables generally consist of numeric values, and foreign keys to dimensional data where descriptive information is kept.
This can result in the accumulation of a large number of records in a fact table over time.
Fact tables are defined as one of three types: Transaction fact tables record facts about a specific event e. This key is a simple primary key. Dimension tables[ edit ] Dimension tables usually have a relatively small number of records compared to fact tables, but each record may have a very large number of attributes to describe the fact data.
Dimensions can define a wide variety of characteristics, but some of the most common attributes defined by dimension tables include: Time dimension tables describe time at the lowest level of time granularity for which events are recorded in the star schema Geography dimension tables describe location data, such as country, state, or city Product dimension tables describe products Employee dimension tables describe employees, such as sales people Range dimension tables describe ranges of time, dollar values or other measurable quantities to simplify reporting Dimension tables are generally assigned a surrogate primary keyusually a single-column integer data type, mapped to the combination of dimension attributes that form the natural key.
Benefits[ edit ] Star schemas are denormalizedmeaning the normal rules of normalization applied to transactional relational databases are relaxed during star schema design and implementation.