A hypothetical company provides digital tools to the agriculture sector using its expertise in predictive algorithm development and machine learning.

Its current irrigation decision support system (IDSS) uses field sensors to detect soil moisture, which an algorithm then combines with separately stored weather data to generate information about a crop’s expected demand for water. However, customers have reported that the IDSS does not always accurately correlate weather with soil moisture prediction.

The company’s IDSS was developed before satellite hyperspectral and imaging data became readily accessible and affordable. Including these data sources would increase the accuracy of the IDSS to predict soil moisture. With satellite imagery and weather bureau data now free-of-charge and easily accessible the company can now enhance its IDSS with more information to predict soil moisture content.

Preliminary research activities

The company decides that it will first review online information to find current methods that use satellite imagery to predict soil moisture. This research identifies similar soil moisture prediction projects that have been undertaken, the machine learning (ML) models, training and testing methods used, and their results. Some sources also provide useful information about using free-of-charge satellite imagery and weather data.

The company’s preliminary review of the information found that researchers have discovered that some weather variables are ranking as unimportant when predicting soil moisture. A recent article convincingly proposes that the ML models are producing false negative errors because the model misses the real importance of the weather variables for predicting soil moisture.

The company decides it needs to solve the variable ranking error to upgrade its IDSS and that the article about false negative errors presents the most promising avenue to a solution. A second online review does not find any follow-up research that tested the idea about false negatives to know how to effectively use weather variables for predicting soil moisture.

The company believes that the false negatives could be caused by a high proportion of uninformative versus useful variables in the dataset. This has the effect of creating noise when training the algorithm as the predefined criteria are poor at selecting variables to predict soil moisture. Based on its experience, the company considers that the most likely cause is inaccurate assumptions about the relevance of weather for predicting soil moisture.

The company proposes that it will identify the most useful variables if it filters weather variables without relying so heavily on assumptions about which have the greatest relevance to soil moisture. Their background research identifies a recently proposed variable relevance framework that it believes could accurately describe the relationship between weather variables and soil moisture without relying on assumptions about their interactions. This framework could be expected to outperform current state-of-the-art filters in terms of scoring the true correlation or dependence between weather variables and soil moisture.

The machine learning model development starts with acquiring data to train and test its random forest regression ML models. The company also procures hardware and software for the machine learning development.

The company uses the new filter framework and current knowledge about soil properties and hydrology to select the variables most likely to predict soil moisture. After finalising its training and validation datasets, the company undertakes the routine step of tuning the hyperparameters for the machine learning algorithm used to train the soil moisture predictive model. The tuning uses a basic dataset to establish benchmarks for both the experiment and the trained model.

Experiment

Knowing that it will need to provide a hypothesis when it registers for the R&D Tax Incentive, the company records the hypothesis that it wants to test:

Applying a variable relevance framework to the dataset will create a machine learning model which accurately identifies which satellite imagery data and weather variables correlate to predict soil moisture.

The company runs its experiments with the variable relevance framework while training the ML model. The experiments involve iterations of training the algorithm on datasets with different selected variables and testing the resulting model for accuracy.

The results of the experiments are captured and then evaluated to understand why or how variables are relevant to accurately predicting soil moisture. The analysis allows the company to draw logical conclusions about the results of its experiments.

What records does the company keep?

The company keeps information about its IDSS research while undertaking its activities, including:

  • documented discussions about its intent to undertake the activities and why
  • results from its online background research including article links, as well as the discussions or notes about the information found
  • plans for how it will run the experiments, including the hypothesis
  • the results of all experimental runs and analyses of the results.

The company also keeps records for each financial transaction and activity related to its research and ensures that the expenditure can be linked to its activities. These include:

  • timesheets to cost when staff were engaged in particular IDSS research activities
  • invoices with notes about where or how equipment, software and services are used.

Make sure you keep evidence of activities and expenditure

When the company conducts their activities, they keep the records they need to show which of their activities are eligible for the R&DTI. They also keep records of R&D expenditure showing how that links to their R&D activities.

Self-assessment Process

At the end of the financial year the company reviews its records to decide which activities are eligible for the R&D Tax Incentive.

Starting with the activity which required experiments, the company assesses the following:

  • Outcome unknown
    • The outcome of applying its chosen machine learning training method to the satellite imagery and weather dataset was not known and could not be determined in advance from current knowledge, information and experience. Online reviews conducted by staff with expertise in the field could not identify any publicly available research into the combination of variables that may cause false negatives, nor how to evaluate and select weather variables with the strongest relationship to soil moisture.
  • Systematic progression of work
    • Without current knowledge and with no way to determine the outcome from observing how existing decision systems work, the outcome can only be determined from an experiment.
    • It had developed a hypothesis based on using a new variable relevance framework, tested the hypothesis in an experiment, captured and evaluated observations, and drew logical conclusions from the analysis.
  • New knowledge
    • It had conducted the activity to generate new knowledge about an improved IDSS, why weather variables rank as important and how to filter these to better predict soil moisture.

The company self-assesses that it can register this experiment as a core R&D activity.

The company assesses that several activities were directly related to the experiment. While several of these activities produced goods or services, these goods or services were either solely or predominantly applied for the experiment, or capturing and evaluating observations. These activities are the online reviews used to develop the hypothesis, data acquisition and analysis, dataset finalisation and machine learning tuning.

The company self-assesses that it can register these activities as supporting R&D activities.

Was this page helpful?