About Flagship 3
Genetic gain in crops has increased over the past 50–100 years of breeding, but on a global scale gains are now stagnating in most crops. Genetic gain can be defined as the increase in performance achieved per unit of time resulting from artificial selection.
For quantitative traits, genetic gain (often referred to as the response to selection) can be improved by increasing the intensity of selection, introducing greater genetic variation, improving selection accuracy (resulting in greater heritability), and accelerating the selection cycles. Plant breeders can therefore achieve more genetic gain by increasing the size of the breeding program to enable greater selection intensity, enhancing the accuracy of selection, ensuring adequate genetic variation, and increasing the number of breeding cycles per year.
Flagship Project 3 is contributing to genetic gain by developing new ways to analyze images of crops for the automatic identification of traits (phenotypes) related to plant growth, health, resilience and yield. The automated recognition of these traits by computers (rather than people in the field) will increase the speed, reliability and precision of trait identification, allowing phenotyping in larger breeding populations and improving the accuracy of selection by providing more reliable phenotypic information. We are focusing on the development of deep learning methods for the automatic estimation of phenotypes from digital images of crops in the field, which will lead to new computational tools that support the global community of plant breeders.
The following projects are currently underway within Flagship 3:
Supervised Learning of Direct Phenotypes
Developing supervised learning methods for the analysis of phenotypes directly related to yield traits by increasing the size and diversity of labeled datasets, improving their reliability, and evaluating them across locations and seasons.
We are developing detection networks locate perimeters of objects in images, density networks trained to generate a numerical value, and semantic segmentation networks to assign class labels to image pixels so that contiguous pixels with the same label form segmented regions. By applying these automated processes to plants, computers can replace humans in tedious manual tasks such as observing the emergence of seedlings, counting the number of flowers in a drone image, and separating vegetation from features such as bare soil in rectangular and non-rectangular plots.
We have used near-ground-level images to detect individual mustard seedlings with some success, and are now using a larger set of annotated data to develop a model that will also work on canola, wheat and lentil. We have trained density networks using near-ground-level and drone images of canola plots following manual annotation and iterative correction. We are integrating these methods into the Deep Plant Phenomics pipeline and PircPics. We have trained a vegetation segmentation network that works with any crop and operates on overlapping image patches using orthomosaics stitched together from individual drone images that have been corrected for perspective (thus resembling a true map). We have trained the network on rectangular plots of wheat and canola, and semi-regularly-spaced lentil plots at various stages of growth.
Unsupervised Learning of Abstract Phenotypes
Supporting new precision breeding approaches by using abstract digital phenotypes to augment genomic selection and thus increasing yield and yield stability.
Digital phenotypes may be directly related to yield traits but information can also be extracted from abstract phenotypes, whose relationship with yield is not initially clear. We are developing unsupervised learning methods to extract abstract phenotypes for use in breeding programs. We are also using semi-supervised learning to identify signatures that predict performance across multiple environments during early-generation yield trials, leading to better predictions of GxE effects without test replications. We are working on enhanced early-generation yield trial designs that augment statistical analysis with digital information to better account for spatial gradients.
We have applied digital phenotyping to our structured populations of canola, wheat and lentil to generate multispectral data, and are now using an unsupervised deep learning algorithm based on our latent-space phenotyping method to augment genomic selection with additional information to improve predictions. We have developed a semi-supervised learning method to assign labels based on the phenotypes of a single genotype grown in multiple locations, allowing predictions in early-generation yield trials when only partial datasets are available for other genotypes. The ability to predict stability over multiple environments early in the breeding program allows us to focus resources early in the breeding process without complex and repetitious field trials. We have commenced work on the inclusion of field topology, spatial layout information and even microbiome data in early-generation yield trial designs.
Learning Temporal Phenotypes
Developing new methods for the analysis of phenotypes that change over time, using latent-space methods for time-lapse images.
End-of-season phenotypes are important targets for plant breeding but information from earlier in the season is also valuable. We are developing state-of-the-art recurrent neural network architectures to analyze images of plots throughout the growing season to better predict end-of-season phenotypes. We are using latent-space phenotyping to automatically detect and quantify responses to treatment directly from images of plants, and we are using high-temporal-resolution images of crops for the extraction of direct phenotypes related to growth and abstract phenotypes related to other traits.
We have adapted neural network architectures tailored for small numbers of images and time points to accommodate more frequent samples, allowing us to predict wheat yield, canola yield, and lentil yield and biomass from drone image data. We have already found that plot volume is a reasonable estimator of end-of-season biomass in lentil. Our latent-space phenotyping method successfully classifies image sequences of plants under treatment vs control conditions, and the learned abstract phenotype can be used as a proxy for resistance or susceptibility to the stress treatment in wheat and canola. We have analyzed high-temporal-resolution image datasets from our canola plots and have used them for the extraction of flowering timing and flowering rate phenotypes. We are now applying the same approach to compare lentil growth rates/stages and their relationship to differences in day length.
Learning Prospective Phenotypes
Developing machine learning approaches for prospecting novel phenotypes in other P2IRC datasets, including optical tomography for root images, multi-scale data, hyperspectral data, and high-resolution images of plant organs.
To uncover prospective phenotypes that may be useful for selection, we are developing algorithms that incorporate knowledge of how roots grow for the segmentation of 3D root images, automated systems for the reconstruction of 3D image sequences from gradually rotated samples, and architectural models of segmented root systems. We are using generative adversarial networks to derive richer datasets from low-resolution images, and we are investigating deep learning methods that combine imaging data and differences in the spatial arrangement of shoots and roots to predict the maturity and shatter rate of canola pods.
We have built on medical imaging methods for the segmentation of root images, improved reconstruction by automatically estimating the precise axis of rotation, and adapted methods for the extraction of architectural models of segmented root systems in order to link novel phenotypes to genotypes and microbiome features. We have generated high-resolution images from low-resolution counterparts using generative adversarial networks that can estimate near-infrared images from RGB data, producing acceptable estimates of agronomic properties. We have developed a deep learning algorithm that combines hyperspectral and RGB images to improve phenotype estimates. We have also developed deep learning methods that learn the best features that describe spatial structures when applied to multiple images of root systems taken from different angles, which will be used for the phenotyping of canola pods.
Understanding Learned Phenotypes
Aiming to gain a better understanding of deep neural networks through novel visualization, evaluation and annotation approaches with the goal of demystifying these methods for plant breeders and making them easier to apply in practice.
Visualizations help us to understand deep neural networks because they focus on the same regions as human annotators, but existing generic visualization tools do not directly support specialized analysis. We are developing visual analytics tools for deep neural networks, including a testing and evaluation framework for deep learning models to make it easier for non-computer scientists to train and deploy these phenotyping tools. We have taken on personnel to build/adapt software for image annotation and establish ground-truth measurements with images to evaluate annotations for reliability.
Results to Date
We have developed a tool that visualizes features learned by different convolution layers, allowing interactive visual exploration of the differences between pairs of deep neural networks. We are testing how well our deep learning models generalize across crops, seasons and locations within the P2IRC program and with international collaborators. We are comparing white box testing (looking at network structures) to black box testing (looking at functions only) and a mixture of these techniques (grey box testing), maximizing neuron coverage in the network so that unexpected behaviours can be utilized. We have developed machine learning evaluation techniques for data (we are currently adapting these to plant images) and new image annotation tools for wheat heads and canola flowers. We have also developed a novel optimization method to auto-fit plot regions on drone images to vastly speed up the analysis of breeder plots.