What data is included in the databases?



  • 5,000 data points on yield were aggregated and cover 524 brands from 100 site locations spanning 2012 to 2019.

  • 1,887 data points on IDC were aggregated and cover over 896 varieties spanning from 2012 to 2019.

  • Counties in Minnesota where data was primarily gathered include (but are not limited to) Marshall, Kittson, Polk and Roseau. North Dakota: Walsh, Cavalier, Pembina, Grand Forks, Ramsey. See all plot locations here

  • All data used in the SoyZ database are derived from third party variety trials only, including NDSU, University of Minnesota and F.I.R.S.T trials.

    • No independent seed company trials are represented in the data set.

What geographic region does the data come from?


The region within the boundaries of the yellow circle on the left are where the where yield and IDC plots were drawn from. The range of variety maturities in this area was between 0.01 and 0.6

Why did you bother doing this?


A great amount of thought, time and effort has been spent conducting soybean variety yield trials across a number of universities and extension agencies, but those findings have never been aggregated into a comprehensive, user-friendly web-based format. Furthermore, unwieldy data tables and disconnected data within and across state lines culminate in grower confusion, frustration and poor variety selection methods.

Soybean growers wanting to use hard-won university plot data are expected to make sense of large tables dealing with multi-site/location data across multiple years. But the sheer amount of data is difficult to look at visually and so the process of sense-making can digress into a subjective process that results in picking a top yielding variety in one or two years or locations to "give it a try." While there is nothing wrong with this approach, it leaves one to wonder "Did I use the best method for selecting the top varieties available?"

Another approach to variety selection is to rely on seed dealers who provide private company test plot data in addition to personal anecdotes and advice. While these individuals are a resource, my experience has been that they are typically armed with company data that omits disparaging results or compares their varieties against only weaker competitors. (top)

Why is multi-year data necessary for making decisions about IDC tolerance or yield potential


Seed companies, dealers and farmers all have varying degrees of understanding about just exactly how tolerant their varieties are to IDC. Even for varieties that have several years of data, environmental conditions can reveal weaknesses in them when they are otherwise believed to be tolerant. In fact, some varieties can vary as much as one full point in IDC score across subsequent years. So in one year a variety with a 1.6 IDC score (on a 1-5 rating scale) could be rated a 2.6 the following year! On the other hand, a number of varieties appear to be much less variable. Knowing which one you have before the planter rolls is critical. Part of the variability in IDC ratings has to do with the nature of how and when IDC scores are taken (a discussion for elsewhere).  Yet, some varieties appear to be very consistent year over year in their overall rating. Thus, having a broader understanding of the average IDC score and its range of variability across years as related to yield is essential. Presenting it in a clear way so that farmers can quickly and easily access that information has great value potential.  (top)



What is a Z-score and why are you using it?

While some data-sets use a 'percentage of the yield' as a multi-year comparison, I prefer the use of a different kind of transformed score that also represents not only the yield but also how a variety performs relative to other varieties in that plot and in other plots over different years. This transformed score is called a Z-score. I believe this approach also more fairly (or unfairly, depending on your philosophical bent) spreads out the variability in test plots that researchers attempt to control for by having multiple replications of the same variety. In my approach, the variability is simply pooled across the entire plot and spread out over all the varieties. Each variety is then scored how it performs on its own individual performance relative to the plot mean taking into account plot variability. For specifics see the bullet below

  • A Z-score is a measure of the distance in units of standard deviations of a particular value in a data set from the mean of that set of data (in this case, yield). Specifically, it takes into account location variability (by subtracting the observed value from the average and dividing by the location's standard deviation). In this way, it allows one to report the yield as a probability of obtaining an outcome higher or lower than the average.The higher the Z-score the better. (top)


How do I interpret Z-scores?


How to Interpret Z-Scores

A variety that is 1 standard deviation (Z-score=1.0) above the mean is better than 84% of all other varieties. An average Z-score of .5 or more across multiple years and locations is considered exceptional when looking at soybean yields in my opinion.


Interpreting Yield Results

"Even the most superior genetics do not win every plot. When looking over plot data one may question why the product with the best average yield across multiple locations does not have the highest yield at every location, why it does not win every plot, or possibly why its average ranking across multiple locations changes..." See link to article for more... (top)

How might I best use the information in this website?

I like to look for the varieties I know or have grown in the bar charts. Where do they fall on the yield curve in the SoyZ model? What varieties unknown appear to be yielding more? Then I set the threshold for the minimum amount of data necessary to pick a variety (my minimum is 2 years and at least 5 sites) For you it might be 1 year? 3 years? How important is IDC? Other agronomic characteristics? Check them out on the webpage that lists all beans available for sale in 2019 as well as in the IDC tables.

Finally, I like to talk to seed dealers. Ask them “what is/are their most popular variety(ies)? Farmers ‘vote’ with repeat sales. Then I go back to SoyZ model and see where that recommendation falls in the yield comparisons. Especially for new varieties –they may have very little data. Truthfully though, I’m not a huge fan of ‘new’ varieties due to lack of genetic diversity in soybeans.

Pick a few new varieties based on what you find in the database and plant 10-20 acres up against your farms leading variety. Rinse and repeat for at least two years (hard to make a solid judgement in only one year). (top)


  • EvaluationGroup,LLC does not explicitly or implicitly endorse any variety or company. We don't sell seed and don't care what you buy. It is up to you the reader to decide how best to use this information to draw your own conclusions.

  • We are not responsible for any losses incurred as a result of using this data.

EvaluationGroup, LLC  2020