Actions

Archived

Difference between revisions of "Database schema standards"

From Joomla! Documentation

 
(11 intermediate revisions by 2 users not shown)
Line 176: Line 176:
 
An example of resolving a many-to-many relationship is shown below:
 
An example of resolving a many-to-many relationship is shown below:
  
Original relationship:
+
      '''Original relationship:'''
 +
      [[File:Original.jpg‎‎‎‎|alt=Many-to-Many]]
  
[[File:Original.jpg‎‎‎‎|alt=Many-to-Many]]
+
      '''Resolved relationship:'''
 +
      [[File:Resolved.jpg‎‎‎‎|alt=Resolved Many-to-Many]]
  
 +
==Database Terminology==
 +
There are a number of terms that may be encountered when researching or designing database schemas.  These include:
  
 +
* '''Attribute/Field:''' a characteristic or property of an entity.  These will vary depending on the entity, and will generally be characteristics of that particular entity, such as name, age and gender for a person.
  
 +
* '''Cardinality:''' the number of each entity that must exist within a relationship.
  
 +
* '''Composite Key:''' several primary keys that serve as one within a single entity.
  
 +
* '''Data:''' raw information that has yet to be processed, and therefore serves no purpose on its own.
  
 +
* '''Data Item:''' a single cell within a table or entity (one value within a field or attribute).
  
 +
* '''Data Type:''' the type of data that can be inserted into a field or attribute within a table.
  
 +
* '''Database Management System (DBMS):''' a collection of related databases.
  
 +
* '''Entity:''' a person, place, object, event or idea that stores and processes data.
  
 +
* '''Entity-Relationship (ER) Diagram:''' a visual representation of a DBMS.
  
 +
* '''Floating Entity:''' an entity that does not have a direct relationship with another entity.
  
[[Category:InUse]][[Category:Working Groups]]
+
* '''Foreign Key (FK):''' a primary key in one table that is used within another table, and is used to determine the relationship between the entities. These are usually distinguished by showing them in italics in an ER diagram.
{{inuse}}
+
 
 +
* '''Information:''' data that has been processed and can be analysed and understood by humans.
 +
 
 +
* '''Normalisation:''' the act of reducing data redundancy by creating more tables or entities to store repeating information.
 +
 
 +
* '''Participation:''' whether the participation in a relationship is optional or mandatory.
 +
 
 +
* '''Primary Key (PK):''' an attribute whose values are unique for each entry.  These are underlined when designing ER diagrams to differentiate them from other attributes.
 +
 
 +
* '''Record:''' one row of data entries within a database (one entry per attribute/field).
 +
 
 +
* '''Redundancy:''' when repeating data exists within an entity or table.
 +
 
 +
* '''Relational Data Structure:''' a written representation of the structure of a DBMS.
 +
 
 +
* '''Relationship:''' the association between entities in a DBMS.
 +
 
 +
* '''Relationship Name:''' describes the relationship between entities in an ER diagram.
 +
 
 +
* '''Self-Referencing:''' when an entity has a relationship with itself in a DBMS.  An example of self-referencing in an ER diagram is:
 +
 
 +
      [[File:Self_referencing.jpg‎‎‎‎‎|alt=Self referencing]]
 +
 
 +
* '''Super-Types and Sub-Types:''' sub-type entities are entities that exist within other entities to store information that is not needed by other entities.  The container for these sub-type entities is called the super-type entity.  The sub-type entity inherits all of the attributes of the super-type, but no vice-versa. An example that includes super and sub-type entities is:
 +
 
 +
      [[File:Sub_and_Super_types.jpg‎‎‎‎‎‎|alt=Super and sub types]]
 +
 
 +
* '''Surrogate Key:''' a primary key that is automatically generated by a system.
 +
 
 +
* '''Table:''' an entity that has been implemented into a database serves as a table to store information.
 +
 
 +
* '''Weak Entity:''' an entity whose PK is partially or totally derived from another entity.
 +
 
 +
[[Category:Archived pages]]

Latest revision as of 10:50, 30 November 2013

Replacement filing cabinet.png
This page has been archived - Please Do Not Edit or Create Pages placed in this namespace. The pages in the Archived namespace exist only as a historical reference, it will not be improved and its content may be incomplete.

Contents

Naming Conventions

General Naming Conventions

There are many conventions used when naming tables or fields, so others are capable of understanding the contents of the table. While there are no absolute conventions, the following are used quite often, with the intention of clearer SQL statements.

  • No spaces. Using spaces is highly not advised when naming tables or fields. It makes SQL statements more complicated and harder to understand. In order to identify a gap in between words, underscores could be used. However, the preferred method is the use of Pascal casing.
  • Pascal casing. This practice is known by many other names, but will be referred to here as Pascal casing. This involves using a capital letter at the beginning of each new word.
      Examples:  ExampleTable, TableOfContents
  • Singular names. The use of singular names is just a simpler way of writing names. While plural names could be used, singular names are shorter and easier to understand.
      Example: Student (as opposed the Students)
  • Simple, yet describes the contents of the table or field. Names should be as small as possible, while still giving a user a clear indication of its contents.
  • Avoid similar names. When making larger databases, and having many tables, the tables should be clearly identifiable. If 2 or more tables have similar names, consider renaming one of the tables or combining the contents of them both. The same could be said for fields, to a lesser extent.

Table Naming Conventions

Using prefixes. Helpful in distinguishing the different types of tables. While the prefix should be in lowercase, use Pascal casing to begin the next word. There are other types of tables; these are just the more common types. Similar types of naming conventions could be used for the types of tables that are not listed here.

  • Data tables.Standard data tables should have the prefix: tbl
      Example: tblExample
  • Link tables. Tables that link 2 or more data table with a many-to-many relationship should have the prefix: link
      Example: linkVendorProduct

Field Naming Conventions

  • Using a prefix.

Fields in tables. Prefix should consist of the first 3 letters from table’s name. This is to easily identify which field belongs to which table. If any prefixes conflict, then use additional letters for the prefix. For example:

      Table name: tblExample
      Field name: exaField
  • Link tables. Due to the nature of link tables, the above prefix doesn’t make much sense and is redundant when used in link tables. Instead the same prefix link is used, much like the link tables.
      Table name: linkVendorProduct
      Field name: linkVenID
  • Primary keys. The general name of a table’s primary key should be the first 3 letters of the table name followed by ‘ID’. Similar to the general prefixes for other fields. Depending on the contents of the table, this may vary, however all primary keys should be unique of one another.
      Table name: Customer
      Primary key: cusID
  • Foreign keys. Use the standard prefix of 3 letters, but follow it with the referred table’s full name, followed by ‘ID’.
      Table name: Product
      Reference table: Customer
      Foreign key: proCustomerID

Data types

There are a number of data types used in databases. The following are the common ones used.

  • Integer (int): Integer is a data type that stores whole numbers.
Storage (Bytes) Minimum Value Maximum Value
4 -2,147,483,648 2,147,483,647
  • Float: A float data type is one that holds fractional values. That is they contain a decimal point that can “float” to the left or right depending on exponent. Floats have single precision. When defining a float, the total number of digits and the number of digits after the decimal point is declared.
Storage (Bytes) Minimum Value Maximum Value
4 +1.175494351E-38 +3.402823466E+38
  • Double: Double is a large floating point number. The double data type has double precision and like floats can hold fractional values.
Storage (Bytes) Minimum Value Maximum Value
8 +1.7976931348623157E+308 +2.2250738585072014E-308
  • Bit: Bit is a data type that is used to store bit field values. Bit(M) will store M bit values, where M ranges from 1 to 64. Assigning a value to the bit field data type that is less than M bits long will result in the value being padded on the left with zeros.
  • Date: This data type stores the date but no time. The format of the date is ‘YYYY-MM-DD’
Storage (Bytes) Minimum Value Maximum Value
3 1000-01-01 9999-12-31
  • Datetime: The datetime data type stores both date and time. The format of datetime is ‘YYYY-MM-DD HH:MM:SS’.
Storage (Bytes) Minimum Value Maximum Value
8 1000-01-01 00:00:00 9999-12-31 23:59:59
  • Time: The time data type stores the time value. The format of time is ‘HH:MM:SS’. The hours of the time data type is large because it can be used to display the time of day, elapsed time or the time interval between two events.
Storage (Bytes) Minimum Value Maximum Value
3 -838:59:59 838:59:59
  • Year: The year data type is used to store the year values. By declaring the year as Year (4) a four digit year is specified. Declaring the year as Year (2) a two digit year is specified.
Storage (Bytes) Minimum Value Maximum Value
1 1901 2155
  • Char: The char data type holds strings or text. This data type is a fixed length data type and is declared with the length. The length specifies the maximum number of characters that are to be stored. If the text stored is less than the length then the remaining length is padded with spaces. The total number of storage in bytes is dependent on the length of the characters.
Minimum Length Maximum Length
0 255
  • Varchar: The varchar data type is like char where it holds text or strings and has its length defined during declaration. However, unlike the char data type, the varchar data type is not fixed length. It is variable length hence the term varchar. This means that if the text stored is less than the length, the remaining length is not padded with spaces.
Minimum Length Maximum Length
0 65535

ER Notation Standards

The following are required in creating an entity relationship diagram:

  • Entities are represented by labelled rectangles. The label is the name of the entity. Entity names should be singular nouns.
  • Relationships are represented by a solid line connecting two entities. The name of the relationship is written above the line. Relationship names should be verbs.
  • Attributes, when included, are listed inside the entity rectangle. Attributes which are identifiers are underlined. Attribute names should be singular nouns.
  • Cardinality of many is represented by a line ending in a crow's foot. If the crow's foot is omitted, the cardinality is one.
  • Participation is represented by placing a circle or a perpendicular bar on the line. Mandatory existence is shown by the bar next to the entity for an instance is required. Optional existence is shown by placing a circle next to the entity that is optional.

Example of an ER diagram

Cardinality and Relationships

Defining the relationships between entities of a Database Management System (DBMS) is a crucial stepthat must be taken during the design process. They allow the designer and end user to understand how the different entities relate to each other, and understand the connections that exist between them. There are four main components that must be included when defining relationships within a database schema.

The first of these is the relationship itself. This involves identifying the relationships that exist between entities, and connecting them using the primary key/foreign key link where the primary key of one entity matches the foreign key of the other entity in the relationship. If there is no direct relationship between two entities, then they do not need to be connected. Every entity within the database must link to at least one other entity, so there should be no entities that aren’t in a relationship (floating entities).

The second component is the relationship names. The relationship names ultimately describe the relationship between entities that have a relationship with another. They are generally short, being 2-3 words long, and are identified during the development of the ER diagram. As shown below, they are placed toward the entity that the relationship name is referring to, as opposed to referring from.

The third component of a relationship is the participation. This describes whether or not entries have to exist within an entity for a relationship to occur. This is usually described by using either a circle (O) for an optional participation requirement, or a pipe symbol (|) for a mandatory participation requirement. Once again, the participation requirement is shown toward the entity that is being referred to.

The final component is the cardinality of each relationship. The cardinality determines how many of each entity in a relationship must exist for that relationship to occur. They are generally represented by a crow’s foot link between entities, with the absence of a crow’s foot representing a singular instance and the appearance of a crow’s foot representing multiple instances, as shown below:

Cardinality

An example using the above information is staff working at a university. Generally, staff members teach at one university, and a university employs many staff members. This case can be represented as:

Relationship

Here, the relationship between these entities is represented by a single line connecting the two together. The relationship names are “employs” and “teaches at”; explaining that lecturers teach at a university and a university employs lecturers. The cardinality and participation combined state that a university must employ at least one lecturer, but one or more lecturers may not necessarily teach at a university.

Resolving Many-to-Many Relationships

Many-to-many relationships in an ER diagram must not exist when they are implemented into a database, as this can cause some information to be repeated within both tables. The way to fix this problem is to resolve the many-to-many relationship.

The first step in resolving a many-to-many relationship is to create a new entity that is placed between the entities present in the relationship. This entity will then hold the repeating information that appears in both tables, so the information is no longer repeated and can be easily updated.

The next step is to fix the cardinality and participation. This involves creating two one-to-many relationships that link to the new entity. The “many” end of the relationship must always point to the new entity between the two pre-existing entities. For the participation, the pre-existing entities must have a mandatory participation requirement within the relationship, and the new entity can either have an optional or mandatory requirement depending on the information that will be stored within it.

Finally, the link between the two original entities must be created. This simply involves linking each pre-existing entity to the new entity with the primary key to foreign key link by placing the primary key from each pre-existing entity into the new table and considering it as a foreign key.

An example of resolving a many-to-many relationship is shown below:

      Original relationship:
      Many-to-Many
      Resolved relationship:
      Resolved Many-to-Many

Database Terminology

There are a number of terms that may be encountered when researching or designing database schemas. These include:

  • Attribute/Field: a characteristic or property of an entity. These will vary depending on the entity, and will generally be characteristics of that particular entity, such as name, age and gender for a person.
  • Cardinality: the number of each entity that must exist within a relationship.
  • Composite Key: several primary keys that serve as one within a single entity.
  • Data: raw information that has yet to be processed, and therefore serves no purpose on its own.
  • Data Item: a single cell within a table or entity (one value within a field or attribute).
  • Data Type: the type of data that can be inserted into a field or attribute within a table.
  • Database Management System (DBMS): a collection of related databases.
  • Entity: a person, place, object, event or idea that stores and processes data.
  • Entity-Relationship (ER) Diagram: a visual representation of a DBMS.
  • Floating Entity: an entity that does not have a direct relationship with another entity.
  • Foreign Key (FK): a primary key in one table that is used within another table, and is used to determine the relationship between the entities. These are usually distinguished by showing them in italics in an ER diagram.
  • Information: data that has been processed and can be analysed and understood by humans.
  • Normalisation: the act of reducing data redundancy by creating more tables or entities to store repeating information.
  • Participation: whether the participation in a relationship is optional or mandatory.
  • Primary Key (PK): an attribute whose values are unique for each entry. These are underlined when designing ER diagrams to differentiate them from other attributes.
  • Record: one row of data entries within a database (one entry per attribute/field).
  • Redundancy: when repeating data exists within an entity or table.
  • Relational Data Structure: a written representation of the structure of a DBMS.
  • Relationship: the association between entities in a DBMS.
  • Relationship Name: describes the relationship between entities in an ER diagram.
  • Self-Referencing: when an entity has a relationship with itself in a DBMS. An example of self-referencing in an ER diagram is:
      Self referencing
  • Super-Types and Sub-Types: sub-type entities are entities that exist within other entities to store information that is not needed by other entities. The container for these sub-type entities is called the super-type entity. The sub-type entity inherits all of the attributes of the super-type, but no vice-versa. An example that includes super and sub-type entities is:
      Super and sub types
  • Surrogate Key: a primary key that is automatically generated by a system.
  • Table: an entity that has been implemented into a database serves as a table to store information.
  • Weak Entity: an entity whose PK is partially or totally derived from another entity.