A data type is an attribute associated with a piece of data that tells a computer system how to interpret its value. Understanding data types ensures that data is collected in the preferred format and the value of each property is as expected.
For example, knowing the data type for “Ross, Bob” will help a computer know:
- whether the data is referring to someone’s full name (“Bob Ross”)
- or a list of two names (“Bob” and “Ross”)
Understanding data types will help you ensure that:
- the data you collect is always in the right format (“Ross, Bob” vs. “Bob Ross”)
- the value is as expected (“Ross, Bob” vs. “R0$$, B0b”)
Note: Data types should not be confused with the two types of data that are collectively referred to as customer data: entity data and event data. To properly define event properties and entity properties, you need a good understanding of data types. A well-defined tracking plan must contain the data type of every property to ensure data accuracy and prevent data loss.
Common data types
||Numeric data type for numbers without fractions
||-707, 0, 707
|Floating Point (float)
||Numeric data type for numbers with fractions
||707.07, 0.7, 707.00
||Single letter, digit, punctuation mark, symbol, or blank space
||a, 1, !
|String (str or text)
||Sequence of characters, digits, or symbols—always treated as text
||True or false values
||0 (false), 1 (true)
|Enumerated type (enum)
||Small set of predefined unique values (elements or enumerators) that can be text-based or numerical
||rock (0), jazz (1)
||List with a number of elements in a specific order—typically of the same type
||rock (0), jazz (1), blues (2), pop (3)
||Date in the YYYY-MM-DD format (ISO 8601 syntax)
||Time in the hh:mm:ss format for the time of day, time since an event, or time interval between events
||Date and time together in the YYYY-MM-DD hh:mm:ss format
||Number of seconds that have elapsed since midnight (00:00:00 UTC), 1st January 1970 (Unix time)
It is the most common numeric data type used to store numbers without a fractional component (-707, 0, 707).
Floating Point (float)
It is also a numeric data type used to store numbers that may have a fractional component like monetary values do (707.07, 0.7, 707.00).
Please note that number is often used as a data type that includes both int and float types.
It is used to store a single letter, digit, punctuation mark, symbol, or blank space.
String (str or text)
It is a sequence of characters and the most commonly used data type to store text. Additionally, a string can also include digits and symbols, however, it is always treated as text.
A phone number is usually stored as a string (+1-999-666-3333) but can also be stored as an integer (9996663333).
It represents the values true and false. When working with the boolean data type, it is helpful to keep in mind that sometimes a boolean value is also represented as 0 (for false) and 1 (for true).
Enumerated type (enum)
It contains a small set of predefined unique values (also known as elements or enumerators) that can be compared and assigned to a variable of enumerated data type.
The values of an enumerated type can be text-based or numerical. In fact, the boolean data type is a pre-defined enumeration of the values true and false.
For example, if rock and jazz are the enumerators, an enumerated type variable genre can be assigned either of the two values, but not both.
Assuming that you are asked to fill in your preferences on a music app and are asked to choose either one of the two genres via a dropdown menu, the variable genre will store either rock or jazz.
With enumerated type, values can be stored and retrieved as numeric indices (0, 1, 2) or strings.
Also known as a list, an array is a data type that stores a number of elements in a specific order, typically all of the same type.
Since an array stores multiple elements or values, the structure of data stored by an array is referred to as an array data structure.
Each element of an array can be retrieved using an integer index (0, 1, 2,…), and the total number of elements in an array represents the length of an array.
For example, an array variable genre can store one or more of the elements rock, jazz, and blues. The indices of the three values are 0 (rock), 1 (jazz), and 2 (blues), and the length of the array is 3 (since it contains three elements).
Continuing on the example of the music app, if you are asked to choose one or more of the three genres and you happen to like all three (cheers to that), the variable genre will store all three elements (rock, jazz, blues).
Needs no explanation; typically stores a date in the YYYY-MM-DD format (ISO 8601 syntax).
Stores a time in the hh:mm:ss format. Besides the time of the day, it can also be used to store the time elapsed or the time interval between two events which could be more than 24 hours. For example, the time elapsed since an event took place could be 72+ hours (72:00:59).
Stores a value containing both date and time together in the YYYY-MM-DD hh:mm:ss format.
Typically represented in Unix time, a timestamp represents the number of seconds that have elapsed since midnight (00:00:00 UTC), 1st January 1970.
It is typically used by computer systems to log the precise date and time of an event, down to the number of seconds, in a format that is unaffected by time zones. Therefore unlike datetime, timestamp remains the same irrespective of your geographical location.
If you think about it, each one of us carries a timestamp—enter the date and time of your birth here to see your own.
Example and recap
Different programming languages offer various other data types for a variety of purposes, however, the most commonly used data types that you need to know to become data-led have been covered.
A good way to think about data types is when you come across any form or survey.
Looking at a standard registration form, you should keep in mind that each field accepts values of a particular data type.
A text field stores the input as a string while a number field typically accepts an integer.
Names and email addresses are always of the type string, while numbers can be stored as a numerical type or as string since a string is a set of characters including digits.
In single option or multiple option fields, where one has to select from predefined options, data types enumerated type and arrays come into play.
In the Facebook sign up form above, the Birthday field has 3 sub-fields, each of enumerated type asking you to choose one option for day, month, and year respectively.
Similarly, the Gender field wants you to choose from the two predefined choices or add a custom one, the input of which is stored as string.
Strings like passwords are always hashed or encrypted (or at least should be).
Now let’s look at the importance of data types.
Importance of data types
You might be wondering why it’s important to know about all these data types when you are mainly concerned with understanding how to leverage customer data. There is only one main reason—to gather clean and consistent data.
Your knowledge of data types will come in handy in two stages of your data collection efforts as described below.
The process of tracking behavioral data from primary data sources and syncing the data to an internal or external storage system is known as instrumentation.
The first step in the instrumentation process is to create a data tracking plan. Everything you need to know about a tracking plan is covered in this guide.
When deciding which events to track and what properties to collect (both event and entity properties), specifying the data type of each property in the tracking plan makes the instrumentation process a lot more efficient and leaves little room for error.
This is particularly helpful for engineers who are tasked with the implementation. By making sure that each property is sent with the correct data type, data inconsistency can be avoided.
As a data-led professional, it is likely that you will gather data from your customers via surveys throughout the customer journey—from onboarding to churn.
The questions you ask in a survey could be open-ended (text or number) or come with predefined choices like a drop-down list (enum), checkboxes (array), radio buttons (boolean), or even a slider (depends).
To store the data from surveys (in a database or a third-party system), you need to specify a property name (industry_name, job_role, cancellation_reason, is_satisfied, etc.) and its data type (string, number, boolean, etc.) for every field in your survey. The property name stores the value entered and the data type validates that the value is as expected.
Doing so results in data being consistent and makes it easier to analyze and activate the data. It’s good to keep in mind that open-ended questions make for tougher analysis as you cannot aggregate the responses unless you transform the data by parsing each response and extracting the text that matches a rule.
With predefined choices, analysis is straightforward and is not affected even if you change the choices at a later stage (refer to enum and array data types).
Putting data types into practice
Application of your knowledge on data types is not limited to data collection or instrumentation; other activities such as data management, data integration, and internal application development (using no-code or low-code tools) should also become a lot easier now that you understand the various data types.
Learn how you can go further with data by reading The Amplitude Guide to Behavioral Data & Event Tracking. Or, see your data types in Amplitude.