There is a growing body of evidence that student evaluations not only 1) do not measure teaching effectiveness, and may well be negatively correlated with it, but also 2) that women and other visible minorities fare worse across the board on them. In other words, they're ineffective at measuring what they are frequently seen to measure and highly discriminatory in what they do measure.
The current state of the research is nicely summarized by Philip Stark in this post at The Berkeley Blog, which should be read in its entirety. Even having previously known about much of what Stark discusses, I was particularly stunned by the following:
• students’ ratings of instructors can be predicted from the students’ reaction to 30 seconds of silent video of the instructor: first impressions may dictate end-of-course evaluation scores, and physical attractiveness matters
• the genders and ethnicities of the instructor and student matter, as does the age of the instructor
It's enough to make one wonder how we have allowed the practice of conducting these evaluations to go on for so long and why anyone takes them seriously at all.
And yet, everyone who's been on the job market lately will have seen some variant of the phrase "candidate must show evidence of teaching excellence." It's code for 'you have to include some good, recent evaluations in your job packet.'
For many of us, especially adjuncts who've been out of grad school for a few years, the evaluations in question will likely be student evaluations (as many, especialy in the adjunct pool, are not observed regularly if at all by faculty peers after we leave graduate school).
And of course student evaluations are also a principle tool used by many departments in deciding who gets promoted, rehired in their current jobs, assigned to teach which courses, etc. This is, again, especially true for adjuncts, for whom student evaluation data are often the only criterion.
It's hard to avoid the conclusion that administrative convenience has been allowed to trump the fact that this 'evidence' is not really evidence of what it is usually seen to measure and that it is systematically baised against already marginalized groups within the profession.