Quantcast
Channel: JDWolverton
Viewing all articles
Browse latest Browse all 110

The Dunning-Kruger Effect Got An Update: It's Bunk

$
0
0

The Dunning-Kruger Effect is frequently the meme used in social media to shame people. The D-K effect in snark, can be translated like this:

Like anything you see on the internet if your fist reaction to this idea is Right On! Or, maybe you have a visceral reaction and exclaim: Yes! That might be because of a different phenomenon called confirmation bias.

The D-K effect enthralled people. You can read the entire original 1999 D-K paper here. The study’s results have been replicated over and over. People cling to this idea, because it explains so much about our crazy uncles. Right?

It turns out it was a flawed study. The results aren’t what D-K thought they were. The real result is:

Debunking the Dunning-Kruger effect – the least skilled people know how much they don’t know, but everyone thinks they are better than average

The basics of the study, the method and analysis had a problem. The original study did several studies humor, logical reasoning, grammar . (For example, one of these Studies was a 20 question test to 45 people. They asked these people to estimate how many of their answers were correct. They also asked these people to assess how well they did compared to everyone else in the study on a scale of 0-99.) They did not make any adjustments for the common cognitive bias most subjects have in that anyone with a healthy ego, thinks they are better than average on just about every variable you can imagine. The 4th study was a request for the bottom and top quartiles to come back for a recap. These subjects reviewed their tests, then reviewed the tests of another participant, reviewed their own performance again; then was asked to reassess how well they did to the other participants. 

Each subject had four factors/values:

  1. The Subject’s number of correct answers to the test (Values 1-20).
  2. The Subject’s rank in the group according to their correct answers (split into quartiles, 1-4).
  3. The Subject’s guesstimate of the number of questions they got correct (Values 1-20). 
  4. The Subject’s assessment of how they did compared to everyone else in the study (Percentile 0-99).

Please note that variable 1,2 and 3 are really the same thing, but looked at in a different way. The second variable is still correct answers, but the values were reduced to 1-4. The third is still correct answers but with a binary confidence (correct/wrong) estimate. The the lowest performing subjects overestimated their correct answers by 20% and the top performers underestimated their score by 15%. Both the top and lowest performers said they had 14 correct answers. That’s not bad. 

Eric C. Gaze  Senior Lecturer of Mathematics, Bowdoin College explains where the issue is:

The results appear more striking when looking at how students rated themselves against their peers, and here is where the better-than-average effect is on full display. The lowest-scoring students estimated that they did better than 62% of the test-takers, while the highest-scoring students thought they scored better than 68%.

By definition, being in the bottom 25% means that, at best, you will score better than 25% of people and, on average, better than just 12.5%. Estimating you did better than 62% of your peers, while only scoring better than 12.5% of them, gives a whopping 49.5 percentage-point overestimation.

The measure of how students compared themselves to others, rather than to their actual scores, is where the Dunning–Kruger effect arose. It grossly exaggerates the overestimation of the bottom 25% and seems to show, as Dunning and Kruger titled their paper, that the least skilled students were “unskilled and unaware.”

Gaze redid the original study with a random number generator for something like 1,154 fictional participants. He split the results into quartiles. He did the math and they got the same results as D-K got with real participants. I’m not a stat whiz, but that raised my eyebrow. Soooo.. hmmm…. I had to look for where it went off the rails. When random number generator spits out the same results as real data, … the effects they supposedly measured did not differ from randomness. Yeah, I’m repeating myself…. Where’s my stats text book? Nah, that will take too long. Time to go to the math wizards in the family. Mr. Wolverton and Chibi Wolverton. It was faster. They were faster.They are comparing the same information x vs. y-x. Ah! Can you say Autocorrlation?

What is that!?!

Here we go.

The original D-K plot graphs used the actual test scores on the x horizontal axis plotted from the lowest to the highest score left to right. The y axis is the subjects guess of what they got right on the test. Another version of the test score. When D-K made the y axis it was calculated as y-x plotted from the lowest to the highest left to right. What D-K didn’t do was the Durbin-Watson test.

Durbin-Watsontestformula.PNG

It’s there on the right and I still think of statistics as sadistics. This is math on a level where I only do it with the help of a spreadsheet and I ask my husband or daughter to check my math, because I know someone needs to check my math. The Hypotheses for the Durbin Watson test are:
H0 = no first order autocorrelation.
H1 = first order correlation exists.
(For a first order correlation, the lag is one time unit). The D-W assumptions are:

Here’s a web site where you can do your own autocorrelation test. What you are looking for are the plot values going from 0-4. If you get values of 2, there’s no correlation at the first level. 1-<2 gives a dome shape (positive correlation) over the xy graph. >2-4 will show scattered dots >2 up and below the mid line (negative correlation). What this means is you are looking at the same data from a different point of view.

When autocorrelation is detected in the residuals from a model, it suggests that the model is misspecified (i.e., in some sense wrong). A cause is that some key variable or variables are missing from the model. Where the data has been collected across space or time, and the model does not explicitly account for this, autocorrelation is likely. For example, if a weather model is wrong in one suburb, it will likely be wrong in the same way in a neighboring suburb. The fix is to either include the missing variables, or explicitly model the autocorrelation (e.g., using an ARIMA model).

So what did D-K miss? Well, this guy, Gaze thought his friend could help. Ed Nuhfer looked at it and thought the subjects assessing their own abilities was flawed.

My colleague Ed Nuhfer and his team gave students a 25-question scientific literacy test. After answering each question, the students would rate their own performance on each question as either “nailed it,”“not sure” or “no idea.”

An assesser’s assessment of their confidence on their performance. What this showed if you separated the results by education level you will find with greater education the subject is better at assessing their abilities. To see that graph go 2/3’s down the page from here. The entire article is worth reading. I think I’m ok with fair use to reproduce this graph. Note 1. The green lines represent the correct answers per education attainment group. 2. The x axis is separated by level of education attainment 3. The length of the variations of the vertical dots. The 3 left most columns are Freshman through Juniors in college. The 3 right most columns are Seniors, Grad students and the last column is Professors. Also, note that the vertical dot variations get smaller with advancement of education. The more education you get, the more accurate you get with assessing your abilities. That idea flew over the heads of D-K, the study’s authors.

RedoofD-Kwith3valuesontheassessment.PNG

So D-K’s main assertion didn’t prove a psychological effect they thought they had. The irony of  their study’s Title is enormous: “Unskilled and Unaware of It: How Difficulties in Recognizing One's Own Incompetence Lead to Inflated Self-Assessments” . In other words, they didn’t have a stats nerd on the team. Their results showed a validity in the study Title, but not the one they published in 1999.

Debunking the Dunning-Kruger effect – the least skilled people know how much they don’t know, but everyone thinks they are better than average


Viewing all articles
Browse latest Browse all 110

Latest Images

Trending Articles





Latest Images