An introduction to data science and analytics

The growth in data has, in part, been driven by improvements in technology, a shift to an online society and the realisation that data has embodied value. These days every organization is trying to or is working towards the capture of data from machines, sensors, customers and employees. Subsequently we have seen an increased demand for, and evolution of, data science tools and techniques. Traditional data was relatively small and very structured, however in recent years the variety data has become quite broad, encompassing both structured and unstructured formats.

Large organizations are investing millions of dollars storing the data they generate with the benefits realisation of that investment through the analysis that is made available by sophisticated analytical techniques. Often this “sophisticated analysis” is undertaken by a data scientist or analyst.

Data science or analytics capabilities are applied in every industry sector. One example of data science is building a capability to predict if a patient has cancer or not.

Data science applications

Before we get into understanding data science, let’s briefly review what we mean by data.

So, what do I mean by Data?

Data is a distinct piece of information.

The most basic form of data is called a Boolean – a Yes and No, or a One or Zero. Believe it or not, computers work fundamentally using binary data. A byte is as essentially a 1 or a 0 (binary).

The data collected during an analytics effort might contain text, numbers, images or sound. Fundamentally, we can classify this data into two types – Qualitative and Quantitative.

Quantitative Data is more statistical or numerical in its structure.  It is by default, also structured data because it can be numerically measured. The responses to the following questions are typically numerical or quantitative. What is the speed of the train? What is the temperature today?

Given the response is numerical, it means we can undertake statistical analysis such as to take the average (mean, median, mode) or sum them.

Quantitative data can be generated by experiments, surveys with numerical questions (what is your salary), market reports, customer transactions, yearly sales and metrics.

Qualitative Data is non-numerical in nature. It describes the characteristics of an object. For example as highlighted in the image to the left, the qualitative data describes the characteristics of the bookshelf as “smelling like oak”, “deep brown” in colour”, “made of wood”, etc. In general most qualitative data is thought of as being semi-structued or unstructured data.

So, what do I mean by data analytics & science?

Broadly speaking, data science is an interdisciplinary field that uses all the usual scientific methods, processes, algorithms and systems to extract knowledge (information) from structured and unstructured data.

Data Science (including scientists and researchers) usually follow a cycle;

  1. Capturing data
  2. Maintaining data
  3. Processing data
  4. Analyzing data; and
  5. Communicating data

Demystifying data roles

There are primarily 3 professional roles associated with data. These roles, like everything in the real world, do not have clear separation of duties. However in general, the fundamental nature of the role are described below. As you can appreciate, the scope of each role differs from organisation to organisation.

Data Scientist

Data scientists are curious and result-oriented, often with exceptional industry-specific knowledge and communication skills. 

The data scientist will:

  • Test hypotheses.
  • Implement prediction and classification algorithms.
  • Look for trends, anomolies and classifications/segments within data.
  • Visualise data for presentation purposes.
  • Propose solutions and strategies that are targeted at improving the targeted results.
  • Explain highly technical results to key stakeholders in a non-technical manner.

They often possess the following:

  • Strong quantitative background in statistics and potentially linear algebra.
  • Programming knowledge which focuses on data warehousing, mining, data modeling, algorithm implementation.
  • Domain knowledge (aka subject matter expertise). If this is not available then it is quite common for the data scientist to align with someone within the organisation that is undertaking the work.

Data Analyst

The data analyst is involved in the design requirements and the activities conducted during the lifecycle of data. They specialize in transforming data that is understood by the end-user.

The data analyst will:

  • Database implementation, development and maintenance for primary and on-request requirements.
  • Data cleaning activities such as bad data, incompleted records and truncation issues.
  • Trend analysis and visualisation
  • Data story-telling to stakeholders

The data analyst often possess the following:

  • Strong analytical skills,
  • Strong understanding of visualization tools,
  • Understanding of programming languages like SQL, DAX.
  • Basic statistical knowledge.
  • Team player; and
  • Communication skills.

Data Engineer

Data Engineer typically focuses on managing data rather than analysing data. They deal with raw data which are not validated, unformatted and contain codes specific to the database such as categorical data. Sometimes referred to as big data consultants

The engineer will:

  • Strong understanding of programming e.g. SQL, Java, python, R,
  • Creation of data workflows, pipeline, data processing and query optimization.
  • Strong understanding of cloud platforms and services (like Azure or AWS) and technologies. Java, Hadoop, NoSQL, data pipelines are typical.
  • Knowledge of algorithms & data structures.

Conclusion

Data science and advanced analytics is an emerging field and has vast applications in almost every industry that is willing to unlock the power of their data. From statistics to insights, data science can help an organisation make informed decisions for better business. There is already momentum by larger organisations to implement data science and advanced analytics capabilities and the return on investment for these projects is high when executed properly.

Feel free to contact Lucid Insights if you need assistance with the implementation of data science and advanced analytics capabilities for your organization.

References

Berkeley. (2020, April 19). What is Data Science? Retrieved from Berkeley School of Information: https://datascience.berkeley.edu/about/what-is-data-science/

Pickell, D. (2019, March 4). Qualitative vs Quantitative Data – What’s the difference. Retrieved from O2 Learning Hubb: https://learn.g2.com/qualitative-vs-quantitative-data