What are three of the five factors that are considered when determining which forecasting method should be chosen?

And they have nothing to do with your technical skills

What are three of the five factors that are considered when determining which forecasting method should be chosen?

Image by Arek Socha on Pixabay

Finally, all data were cleansed and ready to analyze. Andy started overenthusiastically to visualize the data to get a first impression of the data. He had many dimensions and variables such that he spent several days visually analyzing them and determining the best methods to apply. At the end of that week, the team manager told him that he would need a draft presentation about the outcomes next Tuesday because the team manager had to present it in one week to a steering committee.

Andy told him that he has no results yet. But there was no space for negotiations. On Tuesday, conclusions had to be delivered and integrated into a PowerPoint presentation.

Hastily, Andy produced some regression analyses and integrated them into the presentation.

After the steering committee meeting, the team manager told him that the project would not be carried on.

Andy was very frustrated. That was his second project, and the second time it ended with the same decision. He has chosen this position because of the potential for doing great data science work on a large amount of data available.

This story is a real case, and it is not an atypical situation in corporations. I assume that some of you have already experienced a similar situation, too.

The reason that this happens is not your skills.

When thrown into a data science project in a corporate environment, the situation is different from the previous learning context.

My experience is that most data scientists struggle to manage the project, given the many corporate constraints and expectations.

More than a few data scientists are disappointed and frustrated after the first projects and looking for another position.

Why?

They are trained in handling data, technical methods, and programming. Nobody ever taught them in project, stakeholder, or corporate data management or educated them about corporate business KPIs.

It is the lack of experience with unspoken corporate practices.

Unfortunately, there are more potential pitfalls in that area than with all your technical skills.

If you know the determining factors, you can plan your data science tasks accordingly, pursue satisfying projects, and steer your work.

In the following, I give you the eight most important drivers for the model approach selection in the corporate environment and how to mitigate them.

1. Time, timelines, and deadlines

What you need to know

Corporations have defined project processes. Stage-gate or steering committee meetings are part of that where outcomes must be presented. Presentations have to be submitted a few days in advance and must contain certain expected information. Also, corporates are always under pressure to deliver financial results. That leads to consistently tight deadlines. These processes are part of the corporate culture, unspoken, and supposed that the employee knows them.

How to address it?

Ask, ask, ask. Ask about the milestones, e.g., the meeting dates where project decisions will be made.

Set up a time budget. Start at the milestone’s date and calculate backward a project schedule.

Include not only your tasks but also the surrounding actions, like coordination meetings, presentations, and deadlines for submitting the presentations. Do not forget that there is a review round for each presentation, and you have to consider adding a few days in advance of submission. Include time margins for unexpected tasks and troubleshooting.

Only then, choose the approaches for the ability to perform it within the determined schedule. Choose methods that can be run quickly and where you are familiar. After having a few successful results, and hopefully, still time, start experimenting with more complex and new methods.

Example

Human Resources (HR) urgently needed the patterns of HR management’s key success factors towards the business departments and people. Setting up the schedule based on the deadline, we decided only to perform simple linear regression without considering any interdependencies of such key success factors, e.g., the level of education and the attended training pieces. We focused on fitting accurately simpler models and having single contribution factors with high reliability identified.

2. Accuracy needed of the models and the results

What you need to know

The available and ready to use data determine the accuracy of a model. So, the level of detail of a model and the granularity of the data must match. The same is true for the expectations of the granularity of the outcome. The method must match expectations. Any mismatch will give unreliable results.

How to address it?

Select the model according to the granularity of the available data. Do not waste your time to fit a very detailed and accurate model when there is no proper data. Aggregating data and using a less granular model gives more reliable results when not having good quality data.

When the level of accuracy needed for decision making does not match the level that can be achieved by the data, you have to escalate it as early as possible. Do not try to make something up. Only transparent communication helps, prevent surprises, and manages expectations. Otherwise, you will be blamed.

Example

When we analyzed the influencing patterns for nursing homes’ profitability, the granular data had been too inhomogeneous, and the results made no economic sense. So, we aggregated the data and applied simpler models. Based on the results, the authority could already make essential decisions and put guidelines in place for future data management and collection.

3. The relevance of the methods

What you need to know

The right problem must be solved with a suitable method. The question to be answered must be clear. It should not permit any ambiguity. Also, the form of the outcomes must be comparable with other internal and external analyses. Both point the direction of the relevant methodology that should be used.

How to address it?

Make sure that you understand the question that has to be answered. Please do not assume it! Ask! It does not help when you have a solution with the most accurate method but to the wrong question.

Based on that, you can determine if it falls into the descriptive, predictive, or prescriptive field. If the most influential factors are looked for, choose descriptive methods. When the question is to forecast, choose a predictive approach, and only when optimized decision-making under the various effects is the aim, choose prescriptive models. Do not try to be creative. My experience is that it goes in most cases wrong.

Example

Three years ago, my former team opposed heavily against me and had pushed to implement a new trendy time series method for asset return forecasts. Finally, they just executed it — oh yeah, I was angry, but we could not move back because of the deadline. For three years, they struggled to get adequate results without making a lot of adjustment efforts. Recently, one of my former team members told me that they finally moved back to the old model because the new model had included several features not relevant for the outcome but added to much noise.

4. Accuracy of data

What you need to know

The accuracy of the data restricts the pool of possible methods. Very accurate methods do not bring any value when used with less accurate data. The error term will be high. Again, the accuracy of the data and the accuracy of methods must match. Bad quality affects the results — garbage in, garbage out.

How to address it?

Understand the data as well as the requirements of the models. Do not just apply methods for try and error reasons. Do not just replicate methods because it has given excellent results in other, similar cases. You need to tailor them to the requirements of the data accuracy.

Example

In optimizing the operating room capacities of two hospitals, we had to apply two different approaches. In one hospital, granular data for every time point of action, e.g., beginning of anesthesia, entering the operating room, beginning of the surgery, and so on, were available. The data was of good quality because of real-time electronic recording.

In the other hospital, the data was recorded manually and sometimes with hours of delays, and thus, the data was very imprecise. E.g., the data has shown eight surgeries in six operating rooms in parallel.

In the first case, we could fit the granular time series and agent-based models and consider the data’s seasonality. In contrast, in the second case, we had to rebuild the models and work with regression analysis and smoothing out inconsistencies before using them as an input for a less granular agent-based model.

5. Data availability and cost to make data ready to use

What you need to know

How often I have heard ‘we would have the perfect model when we could have this and this data, but unfortunately, we cannot access them in due time.’ A fact is that today, corporates are only able to use between 12% and about 30% of their data. In the discussions I have, companies state mostly, that they are using around 20% of their data. The cost to access them is, in most cases, too high, and no equivalent business case is available. If no business case covers the cost of making the data available, you will not get the data in due time.

How to address it?

Before having all your thoughts around the fancy models, you could apply, clarify, what data is available in due time, and the cost of getting them. Just because ‘the data is available’ in a company, it does not mean that it is available in a reasonable time frame and at a reasonable cost.

Prioritize the data based on the other seven drivers given in this article, and make in each case a cost-benefit analysis: what is the additional benefit from the business perspective when having the data compared to what is the cost of getting them. Never ask, ‘can you give me all data?’. It shows that you have no understanding of the corporate’s business processes, and you will be de-prioritized when you need support, e.g., from IT.

Example

We had been unexpectedly faced with storage format issues in the pattern recognition work on a global bank’s intra-day liquidity data. The data of one of the required data sets of transactions from the prior year were archived on magnetic tapes. Thus, it would have taken several months until the data had been available due to release cycles and transformation into accessible formats. We had to assess alternative data and adjust the models.

6. Data privacy and confidentiality

What you need to know

Customer data are often confidential. Data privacy is regulated by laws, e.g., the GDPR in the EU or the CCPA in the State of California. Financial institutions have their own regulations to protect so-called CID data — client identifying data. Access to such data have only authorized people, and data scientists are rarely amongst them. The data can only be used in anonymized, encrypted, or aggregated forms and after approval from the data owners, security officer, and legal counsel.

How to address it?

Before you start with the project, clarify if any personal data that fall under these restrictions are involved in your data science project. If yes, address it as early as possible, on one side with the IT, because they have eventually already encryption tools to deal with that, on the other side with the legal counsel. Only after having all approvals, and appropriate encryption, work with the data. I have seen many projects that could not be performed not because of the data privacy acts but because it was addressed to late and there was not enough time to get the approvals and encrypt the data in due time.

Example

In a project where credit card transaction data had to be used for third party service analytics, the lawyers needed seven months to clarify and approve the data use. The clarification contained not only the legal aspects but also the way of encryption, the aggregation level that should be used, and technical requirements like access rights and containerization of software.

7. Resources, infrastructure, and tools availability

What you need to know

Projects in a corporate environment have many different departments involved: IT, the business, an innovation team, or an internal consulting group. All are involved in several projects in parallel, and their time is limited.

You need storage and computational power. Corporate rules about software installation are in place, and corresponding approvals are required. If a tool costs and needs a license, a corporate approval process exists. As a data scientist, you do not only need Python and Jupyter Notebook but most probably other tools like Tableau or Alteryx. Some companies require containers like Docker. And some tools are not permitted per corporate policy.

How to address it?

Clarify the tools and infrastructure before you start with the actual project. Estimate the storage and computational power needed, and ensure that it will be available. Clarify the corporate’s policy about data science software, and what tools are available. Inform the people from the other departments early about the upcoming support needed to plan some dedicated time. When working in an already existing data science team, you can clarify this first with your line manager. But even in an established data science team, do not assume that everything you will need for a project is in place.

Example

While working on a large amount of transactional data in a bank, we needed more computational and storage power. We worked in a private cloud environment, and typically, it takes only a few minutes to a few hours until the capacity is added. However, because we worked with client identifying data, in a so-called red zone environment, a virtual zone with very restrictive security, the infrastructure needs to be ‘red zone’ certified by the security officer. And this has then taken two weeks.

8. Product and project management KPIs of the company

What you need to know

Corporates measure the product and project management with KPIs. There are quantitative measures like a net present value for short-term projects or a break-even point for products. And there are qualitative benefits like a shortened time to market, the learning of a project that can be leveraged to other projects, etc. Decisions and approvals of projects are based on such metrics.

How to address it?

It does no matter how great the results of your data science work are; it should always be translated into the company’s KPIs. So, clarify with your line manager what are the steering measures of the company. Translate your outcomes into these metrics and communicate what the benefits for the company are. My experience is that the decision-makers stop fewer projects, more are implemented into the company’s processes, and finally, it builds a lot of trust in the data science team’s work.

Example

One department of a life sciences company tried for months to get internal funding for their intended data science projects, even thought, data, and data science are pillars in the company’s strategy. They finally ask me to support them. We found out that the finance department has investment templates for projects, including the company’s metrics. So, we asked them for that template and assembled all the data science blueprints into such temples. After the next presentation round, they got 60% of all their projects approved. The trigger was that the executive committee could now compare it with the company’s KPIs and other projects’ performance.

Connecting the Dots

Many data scientists are not aware that working in a corporate environment involves up to 80% of other tasks than setting up models and analyze data. And you are eventually, a bit frustrated when you read all my comments.

But knowing the above factors and addressing them early enough, and pro-actively puts you back into the driver seat and avoids bad surprises. The goal is to gain as much freedom as possible for our tasks. It increases project success, and you can keep free time for doing experiments with more complex and new approaches.

Data scientists are not trained in managing such factors and often do not expecting them. Managing them properly is more important than all your detailed technical knowledge.

All my tips and tricks to address these determining factors are neither rocket science nor a secret. But it is vital to raise your awareness of them. I hopefully can enable you to have more control and more fun with your projects.

What are the 5 forecasting factors?

Here are 5 key factors to consider as you refresh forecasting demand models..
External Factors. ... .
Consumer trends. ... .
Product trends. ... .
Events & Promotions. ... .
Internal Factors..

What factors should you consider when choosing a forecasting method?

The selection of a method depends on many factors—the context of the forecast, the relevance and availability of historical data, the degree of accuracy desirable, the time period to be forecast, the cost/ benefit (or value) of the forecast to the company, and the time available for making the analysis.

What are the five forecasting methods?

Techniques of Forecasting:.
Simple Moving Average (SMA).
Exponential Smoothing (SES).
Autoregressive Integration Moving Average (ARIMA).
Neural Network (NN).
Croston..

What are the 3 types of forecasts?

Top Four Types of Forecasting Methods.