Let’s discuss the first phase of CRISP-DM: Business Understanding. Recall that CRISP-DM stands for the “CRoss Industry Standard Process for Data Mining” and it’s a six-phase process for organizing and iterating through a data project. Feel free to check out my previous posts where we discuss Why CRISP-DM is a Data Scientist’s Secret Weapon and What is CRISP-DM, Anyway?

Table of Contents

Table of Contents

What is the Business Understanding phase of CRISP-DM?

Before doing any modeling or analytics work, the first and most crucial part of a data project is developing business understanding. You need to be able to define the business goals and expected impact of the data project. You need to be able to answer questions like “What is the current state of the problem and solution?”, “What is the goal?”, “Who is involved?”, “Who is affected by the outcome and how?”, “What is the expected future state?”, “What is the expected ROI and how will it be measured?”, among others.

This first phase of the project includes indispensable research that will guide your data prep, modeling, evaluation, and deployment efforts.

The image below shows the original diagram for CRISP-DM with subtasks and outputs. It consisted of a 4-step process: Determine Business Objectives, Assess the Situation, Determine Data Mining Goals, and Produce a Project Plan.

Process diagram of Phase 1: Business Understanding.

I have separated the first subtask, Determine Business Objectives, into two tasks: Understand the business objectives, and Determine how you are going to measure success.

With that in mind, the Business understanding phase can be broken down into a 5-step process:

  • 1

    Assess the situation

  • 2

    Understand the business objectives

  • 3
    Determine how you are going to measure success
  • 4
    Establish data mining goals
  • 5
    Write a project plan

Keep reading for ideas on what actions should be taken during each of these steps to get the project off on the right foot.

1. Assess the situation

shallow focus photography of black and silver compasses on top of map

Photo by Alex Andrews

Basically, by understanding the bigger picture, the lay of the land, you can find answers that have real business impact. But you need to understand the business first!

How do you assess the situation?

If you are new to a team or organization, it may be daunting to begin a new project.

You don’t know what you don’t know
Some wise person

In English, we often say to remember the 5 W’s (and an H): Who?, What?, Why?, When?,Where?, and How? But, we can get a little more specific. In order to gain the information we need, we should have several conversations to lay the groundwork at the beginning of a project. Some teams will have kick-off meetings run by a project manager (PM). In an ideal world, your PM will be on the same page with understanding what the data team needs to know to be successful. However, as data projects are new to some organizations, it may require self-advocacy from the data team to develop a culture for this. With that in mind, the data team should take lead in the following ways:

Have Conversations with Stakeholders

  • Clarify problem statements
  • Clarify Goals (more on this below)
  • Gather background information about the current situation
  • Gather information about the current solution, pain points, method of deployment
  • Document specific business objectives decided on by stakeholders
  • Determine what success looks like

  • Establish a vision for deployment of the project (I.e., web app, static/dynamic dashboard, api end point, white paper, etc.)

Determine what success looks like

  • What are the available personel resources (including subject matter experts)?

  • What data sources are available?

  • What are available hardware and software resources?

  • Will there be IT support needed/available?

Ask High-Level Technical Questions

  • Are there requirements or restrictions to accessing the data?
  • Does data exist in a data warehouse?
  • Will data need to be pulled into a big data lake?
  • Will any data need to be purchased?
  • Is the necessary infrastructure in place to support the analytic and deployment goal?
  • Are there standard operating procedures for developing analytic software (scripts, models, pipelines, etc.)?
  • What software tools are available, which are supported by the organization for deployment?

Address Some Logistics

  • Has the project been officially approved?
  • What phase is the project in? (ideation, pilot, prototype, etc.)
  • Does the organization need an intro/overview of data mining to better understand the process?

Developing situational awareness of the business problem also involves understanding the people and roles involved, so get familiar with the business structure through organizational charts. Learn the project groups and business units that will be involved or affected by the data project.

Additionally, begin documenting what you discover. This information will be the beginning of your project documentation and will be a reference for others on the data team. You may come back to many of these questions again. Don’t worry about making changes and updating things as you go–CRISP-DM is meant to be flexible and your understanding of the business should evolve as you learn! Likewise, the business may evolve it’s vision and goals over time as well.

2. Understand the business objectives

 

Critically, you must understand the business problem. What area of the business does it affect, what are the motivations? Has there been any other data mining effort for this business problem? How familiar is the business unit with Data Science?

Also, it’s important to understand the current state of the solution to the problem. Is there already a process in place to address the problem? What are the advantages and disadvantages? Who uses it? Is it automated (if not, how many hours per week or month is spent on it?)

You may consider doing a literature review or industry review regarding how others in the same industry are solving this problem. Is there a standard developed? Is there machine learning research being done in this area–what methods are well-accepted versus state-of-the-art?

Are there compliance requirements, industry standards, or laws (e.g. GDPR) to keep in mind? This is an excellent time to discuss data ethics as well.

At this step in the first phase, it is crucial to get a more detailed inventory of the resources: hardware, software and data. This is an additional logistical step and should be well documented, however, I believe it is important to understand the business problem before talking about data sources because at this stage you need to be able to discuss whether you will need additional data, either purchased or generated. Will you need any other datasets from within the organization? Without understanding the business problem, you may not know what data will be necessary.

3. Determine how you are going to measure success

 

 

black and white dartboard representing goals

Photo by Engin Akyurt

Finally, a crucial part of the usiness Understanding phase is to establish the business objective. What does “done” look like? That are the key performance indicators and/or metrics that will be used to evaluate the success or effect of the data mining effort? Are there objective and subjective measures? Document the metrics that will be used for every business objective.

This would be a great time in the project to dicuss measuring ROI. If the goal is to reduce customer churn, for example, what is the dollar value of reducing churn of x number of customers. If the goal is to reduce downtime of vehicles in a fleet, determine if this value is measured and how it can be measured so that the metric can be tracked.

Explore what your organization expects to gain from data mining. Try to involve as many key people as possible in these discussions and document the results
IBM SPSS CRISP-DM Documentation

4. Establish specific data mining goals

Before moving on to the next phase, the Data team should have a clear idea of what type of problem they are solving, sush as clustering, prediction, or classification. Including a clear numerical goal is also helpful: the team wants to predict component failure at least 1 week before catastrophic failure occurs.

These types of prediction horizons and thresholds can be related to risk and business processes, so the stakeholders are integral to deciding where the thresholds should be.

For example, in predictive maintenance, the risk tolerance of the organization and the standard operating procedures will affect how much warning time is needed before a predicted failure. If a component on a long haul truck is predicted to fail next week, is 1 week enough time to lead time to make sure that truck is off the road and getting repaired before the failure? Or, does the fleet manager actually need 2 weeks’ notice in order to plan the repair and not disrupt operations?

Additionally, during this phase, the data team should be discussing metrics that will be used to assess the model and benchmarks that will be used. The team should consider what deployment will look like for this application.

5. Write a project plan

project plan on whiteboard

Photo by Startup StockStartup Stock

After working through steps 1 through 4 of this first phase of the data mining process, you should have enough information to create an initial project plan. Some good things to include in a data project plan:

  • Phase and time estimates
  • People Resources needed
  • Risks at each phase
  • Deliverables
  • Peer Review and business review

Risks if Business Understanding step is skipped or rushed

  • Scope creep: your project may suffer from an ever ending moving goal post. Make sure to establish the scope of the project and estalish success criteria so the team can stay on track
  • Missing the mark: coming up with an interesting result is cool, but you need to make sure you are working in the direction that the business expects, not chasing “interesting” results
  • Missing the deadline: deadlines in data projects should be flexible due to the iterative nature of data projects, BUT if you waste time going in the wrong direction (see “Missing the mark”), or waiting for resources, data access, support, etc. (i.e., things that are discussed during this phase), then the project will drag on without results for way too long
  • Unrealistic deployment/maintenance: if you don’t discuss realistic deployment possibilties at the beginning of a project, you run the risk of the data team dreaming up something that is unmaintainable in production. For example, maybe they think the result would be a really cool real-time dashboard, but there is no infrastructure to support it.
  • Opposite outcomes from the business goal: (see “Missing the mark”) This is again a result of the data team not having clear direction regarding the business goal and not succeeding at the evaluation stage. An example of this would be if a team was building a fraud detection algorithm. if only 0.01% of transactions are fraud, the team may create a result that detects NO fraud and it would be 99.99% accuracy (although 0% precise), and the result for the business would be that ALL fraud went undetected.

Benefits of a Well-Executed Business Understanding Phase

 

Getting to know the business reasons for your data mining effort helps to ensure that everyone is on the same page before expending valuable resources.
IBM SPSS CRISP-DM Documentation

  • Team Cohesion

  • Clear goals creates better business outcomes
  • Ending with a well documented project: By maintaining good documentation through the business understanding phase, you will have a valuable resource for reference by other data teams and yourself in the future.
  • Happy employees
  • Expected outcomes

Clearly defining a team’s purpose and then setting goals that move the team forward in accomplishing that purpose can be a powerful elixir for strengthening the bond of a team. The real value comes in though once the goals are set and the team steadily works together to achieve them. Solving problems, working through conflict and experiencing progress as a team is a great morale builder

Forbes

Conclusion

There is clearly a lot of ground to cover during the Business Understanding phase of a data project. This step is crucial for giving direction to the data team and to informing how to evaluate results and plan a good delpoyment of a final product–whatever that might look like.

This step is also easiest to skip in practice. Data teams are often eager to dive into the data–to begin exploring and modeling and gleaning insights, but without the due dillligence of the Business Understanding step, the project runs the risks listed above.

Hopefully, this guide has given you a clear idea on the conversations that need to happen and many questions that should be asked at the beginning of a data project.

Leave A Comment

  1. […] this phase, we are keeping in mind the goals we teased out in phase one: Business Understanding. Now, we look closely at the data sources we will use to meet the business goals and drive value […]

  2. […] Phase 1: Business Understanding […]

  3. […] adequate time in the business understanding and data understanding phases of the project. Ask a lot of questions. Document […]

  4. […] Business Understanding: Understand the current situation and determine the business goals for the project […]

  5. […] CRISP-DM Phase 1: Business/Problem Understanding […]

  6. Machine Learning for IoT May 2, 2023 at 09:43 - Reply

    […] Follow CRISP-DM and start with a business use case. […]

  7. […] used to assess the model in the previous phase: Does the model meet the criteria established in the project plan during the first phase? Is it within the allowable […]

Let me know what you think!

Related Posts