This is an important topic, but this post is not an especially quick read, so your TL;DR options are: 1) just read the 12 section headings or 2) jump to the end and read the linked PDF table instead. Enjoy…


Organisational Assessment - which principally means profiling individuals with regard to some aspect of their fit, performance or promotability within the organisation - is a very large market. Any HR Business Partner who begins to look for options, or indeed new alternatives to whatever is currently in use, faces a daunting task.

For example, unless you are a trained statistician, and preferably a psychometrician, it may be challenging to make sense of the typical technical manual, with its reliability and validity statistics. Here’s a number: wonderful; but is it telling me this is a highly effective analysis of some aspect of human behaviour or a bit of a muddle with scales that aren’t really independent of one another? It can be hard to tell.

So instead, here are one dozen criteria, which you almost certainly can assess, without needing a PhD in regression analysis or a passion for Cronbach’s Alpha. There is even a PDF table at the end you can use to do your own review of assessments.

These twelve criteria are based (of course) on our own practice and experience in Elaura, but I hope they may help you to think about what you are looking for.

1. Meaningful: what it measures, matters.

While you would expect this to be a given, not every assessment on the market tells you something meaningful. A hypothesis can be demonstrated to satisfy the 0.95 confidence level, and still not be meaningful. So while good reliability and validity numbers are desirable, they aren’t much comfort if what you are measuring turns out to be trivial and inconsequential. So - in this context - what is meaningful?

Since organisational assessments measure people with respect to organisations and their missions, a meaningful assessment has to have something important to say about how an individual will contribute to that mission and/or relate to others with whom they collaborate to deliver that mission.

2. Recognition: I can see myself in this mirror.

Again, this seems obvious, but it matters that the individual is able to recognise themselves in the picture painted by the assessment.

This is not the same as saying the assessment has to say exactly what the individual would have said about themselves. Quite the opposite; the main reason we need Organisational Assessments at all is that none of us are good at seeing ourselves in totality. A credible assessment will surface data that resides below the level of daily awareness.

However, if the assessment too often says “this person is like x”, but they flat-out deny that they can recognise themselves in those statements, you have a problem. Of course, if others can see issue ‘x’ clearly and the individual can’t, that may be a coaching issue. But if someone simply can’t connect what is in the report with what they find inside their own skin, at the very least you have a potential credibility problem.

3. Empirical: data have been collected in a proper manner and scientific method has been applied.

I was surprised recently to find a publisher of an assessment making a virtue of the fact that their assessment wasn’t empirical. That is like saying, “don’t worry about observable data, I have a wonderful little theory here”. The vast majority of assessments I see start with a theory (more often, a simple assertion of opinion), and then try to retrofit some kind of empirical ‘covering’.

The problem with that is that, if you start with a theory (for example, that people who wear dark framed glasses make better managers) and then simply collect data to support your theory, you cannot rightly describe your work as empirical. We all do this in daily life all the time, finding supporting data for a position we hold; but it doesn’t make for good science.

Empirical research starts with observation of phenomena - in our case, different kinds of behaviour or team performance or organisational culture or whatever; postulates one or more hypotheses, and then designs and conducts experiments to test those hypotheses. In Social Sciences, there is a strong argument that qualitative data deserves at least as much respect as quantitative data; but data there must be, and a hypothesis must be able to be disproved, as well as proven, by the data; not just bolstered by it.

There is a limit to how far a non-psychometrician can push this line of enquiry, but it may be worth treating with some suspicion claims such as “based on brain science” and even “based on data collected during over x million interviews”. By themselves, these are not proof of scientific process.

4. Predictive: given person x in situation y, z will, or is highly likely to, happen.

If your assessment is genuinely meaningful, has good levels of recognition and an empirical basis, you should find that it accurately predicts how a person is likely to respond in a given situation (at least within the scope the assessment sets for itself, e.g. Leadership or Financial Management or Teamwork or whatever).

Why does this matter? Well, that’s easy. If you use your assessment to understand an individual, who then performs in accordance with what the assessment predicted; and if that pattern repeats, so that the assessment consistently and accurately predicts the very different behaviours (or performance or whatever) of a large number of people; then it is likely that it will continue to do so, meaning that you can trust it for future employees, candidates and so on.

On the other hand, if the assessment can only ever describe events or behaviours that have already happened and been observed, but is unable to make predictions, then it is more of a performance appraisal, and not an assessment at all.

5. Explanatory: now I understand why I always do such and such.

Prediction is fine, but if that is all an assessment can deliver, it becomes rather negative. A tool that tells you how you will behave, but can’t tell you why, nor what you could do to modify that behaviour (other than work to keep it under wraps), is essentially limiting your potential as an individual, and in a rather deterministic manner. (This is a serious observation: I have just described one of the most widely adopted assessments in the market.)

So by way of contrast, here is an example of how insights daisy-chain together when you coach an individual using an assessment which does have significant explanatory power:

“You probably find yourself doing x sometimes”

“You are right - how did you know that?” (Recognition)

“It will probably happen most often under conditions y or z”

“Yes, yes, that’s exactly it - tell me more!” (Predictive Power)

“It all comes back to this issue w - you simply need such and such, and when you don’t get it, x starts to happen”

“Ahhhh-ha!” (Explanatory Power)

Where the conversation goes next is to talk about strategies to turn the understanding of “why” and “how” into an exploration of “what I can do about this”. Without why and how, the what becomes a matter of “managing the optics”; and that is potentially very damaging for both the individual and the organisation.

6. Applicable: and now the rubber hits the road, with these practical applications.

We began by asking if the assessment measures something “meaningful”: “applicable” asks if all this insight has practical application to the work of the individual and the team, workgroup or other unit within the organisation. Does it help you manage more effectively (or be managed)? Can you communicate better, be more engaged, more productive, more effective?

The application should deliver outcomes that are observable and, ideally, measurable. If all that happens is that people have a new framework in their heads, but behaviour remains unchanged, then nothing really happened, did it? If, on the other hand, two people who were previously unable to collaborate, are now able to work effectively and productively together to deliver results, then the application of the assessment has had observable and potentially measurable impact.

7. Context-neutral: I am not going to ask you to ‘place yourself’ at home, work or on the beach.

My personal view is that any instrument which requires you to place yourself in a specific context (‘home’, ‘work’, ‘a meeting’), is unlikely to be able to satisfy the previous six suggested criteria. Does such an assessment actually measure anything meaningful at all? At the very least, the fact that the subject can say “I’m a different person at home, or with my friends or when I am with people from work…”, means that they are unlikely to feel there is anything fundamental that they need to understand or address in their behaviour.

A credible assessment should measure something true about the individual as a person; it may play out differently (possibly very differently) in different contexts, but it should be the same person being described. Otherwise I would suggest it is not measuring anything real at all.

8. Calibrated: if you and I have the same score, we are the same.

Given all of the above, it follows absolutely that if I am assessed as Xr56iQ (I just made that up) and you also are an Xr56iQ, we had better behave in an identical manner. (At least within the realm of whatever those letters and numbers are proposing). If not, then we are back to asking “did we measure anything real at all?”

It is also at least desirable, if not necessary, that this calibration should operate across ethnic and cultural boundaries as well. If it doesn’t, then you are only safe to use it within a single ethnic or cultural grouping at a time.

What I mean is that if all Xr56iQs in North America ride Harleys, but all Japanese Xr56iQs sit in quiet contemplation of raked gravel in a garden, you will have a problem using the concept of “Xr56iQ” to explain anything to a mixed group. At the very least, your audience will assume that national stereotypes (or their inverse - perhaps the Japanese all ride Harleys and the Americans are all Zen) have far more impact than whatever it was you thought you had measured with the label Xr56iQ.

So calibrated is good, and calibrated across cultures is best of all. North America and Japan may have different norms and distributions (for example, Japan may have fewer, or more, Xr56iQs than the US); but apples had better always be apples - and never bananas.

9. Social: we can look at people in the context of other people, as a whole.

Even if someone is destined to work as an individual contributor, locked in a back room at your secret facility, they are still going to interact with other people some of the time; and even if they won’t, you can only truly understand them, and the potential they bring, if you can reference them into the map of all human possibilities.

And of course, most of your people are going to be working extensively with other people - not to mention your customers. So an organisational assessment that looks at the individual, but can’t place them in any kind of social context or continuum isn’t going to be a huge help to you.

What really matters, however, is that what you get (for free, as it were) from an assessment which truly embraces this social aspect should be an explanation as to why an individual may be able to work very successfully in one context, and yet struggle terribly in exactly the same role, in a different context. As has been so often observed, man is a social animal; but each person relates somewhat differently to their social context. Understanding how, exactly, these differences work gives you the key to variability in performance within apparently identical roles.

10. Stable: we don’t need to retest people every 18-24 months.

Does this one really matter, from a point of view of credibility. Clearly it has potential financial implications, unless the assessment is free - and even then, employee time is not exactly free. But does it matter with regard to quality?

I would argue yes. Many people like to feel that they have changed dramatically over time, but all the evidence is that we don’t, really. How we manage ourselves, and how we work with who we are - yes, that can change enormously (older and wiser, remember?)

So, for example, I may be someone who feels the need to reflect deeply on important decisions, but who learns that, in order to survive, I have to appear more decisive. Therefore I manage myself so as to pre-process as much information as possible, before the “we need a quick decision” questions are ever presented to me. As a result, I can (mostly) “make decisions more rapidly”. But I am still dealing with the same fundamental ‘me’, someone who needs to process information deeply before committing to decisions; I am just managing myself better. This ‘fundamental me’ is something that does not change much at all, and generally very little after a person reaches their mid twenties.

So my real concern here is that if an assessment measures something that can change from week to week, or year to year, is it measuring something substantial enough to build on in an organisational setting; or is it just picking up something ephemeral, such as mood or state of mind?

11. Versatile: we don’t need different assessments for different applications.

A credible assessment should be applicable across a wide range of organisational applications. If you have followed this far, you probably understand why: not because there is a rule somewhere that says so, simply because the consequences of meeting the criteria already covered is that you will find the assessment is very versatile.

For example, if data is valid and applicable to recruitment (for example) then surely it must be valid and applicable to other applications - on-boarding, mentoring and coaching, understanding team dynamics, and so on. Otherwise, on what basis is the assessment advising on job fit? “Yes, the candidate is an excellent match for this role, but we can’t say why, or help you use that data in managing them in that role.” It isn’t an option to hide behind statistics at this point: “they are just a very, very good match”. You at least have to be able to say what makes them a good match, otherwise what did you actually base that match upon?

12. Scaleable: we can look at every assessed person in a single, meaningful, synoptic view.

And last, but not least, a credible assessment ought to deliver insights at different organisational scales. Think of a camera with a zoom lens. Yes, of course you can focus in on a single flamingo. You can zoom back out to take in the whole flock of fifty thousand birds. Or you can zoom in a bit until you can just see the group of twenty or fifty standing on a particular mud-bank.

In the same way, a credible organisational assessment should allow you to look at an individual in isolation and predict how they will perform, given a certain context. It should also allow you to consider teams of any size, and predict how they are likely to work together most effectively, the problems they may experience, and how to address those; and it should allow you to look at whole BUs, Divisions and the Organisation or Group as a whole, and map culture and capacity on that highly aggregated scale, across regions and so on.

On the other hand, if you are using an assessment which doesn’t take you beyond manually comparing two individual reports, you probably don’t want to try scanning 100, let alone 10,000 or more. And of course, if the assessment fails the ‘calibrated’ criteria, there would be no point in doing so, anyway.

  *   *   *  


In conclusion…

It is worth noting that I have not talked about what data an assessment should be developing.

The Birkman Method (TBM), with which our team work exclusively, majors on motivational and perceptual data, which is highly predictive of behaviour and performance in roles, and TBM can most certainly tick every single one of those boxes. Another tool may well focus on other aspects of organisational behaviour.

Some tools are designed to capture selective data, that may well cover a number of these items, but have no intention of covering others. For example, a good 360 is a valuable tool, but no 360 I have seen claims to have explanatory power. They aim to describe observed behaviour, and to capture the gaps between what the subject observes and what others see.

But to be a truly useful and credible organisational assessment, I would submit that a tool should be ticking all of the boxes outlined above.

Free template

Click here to access a free PDF template that sets out the 12 criteria for your use. The first column has been pre-filled, using TBM as an example.

Originally posted on LinkedIn - Published April 19, 2017