WEBVTT 00:00:01.280 --> 00:00:04.500 >> So the two key terms you hear a lot in assessment. 00:00:04.500 --> 00:00:09.610 We often use it to evaluate assessments are validity which we've just talked about 00:00:09.610 --> 00:00:13.990 and reliability which we've talked about actually throughout issues of reliability. 00:00:13.990 --> 00:00:21.020 Reliability is the consistency of what we're measuring and perhaps the consistency 00:00:21.020 --> 00:00:23.850 of the evidence in that way, okay. 00:00:23.850 --> 00:00:30.360 So we want to limit the impact of factors that aren't language so that 00:00:30.360 --> 00:00:33.990 when we collect evidence, we're just finding out what we can about language. 00:00:33.990 --> 00:00:36.980 We can never eliminate all of the factors that impact it. 00:00:36.980 --> 00:00:43.210 So when we were talking about performance assessment, things like anxiety, creativity, 00:00:43.210 --> 00:00:47.890 background knowledge, we can't get rid of those things on students' performances, but we-- 00:00:47.890 --> 00:00:51.010 what we try to do is limit them as much as possible. 00:00:51.010 --> 00:00:57.030 And reliability in a very basic sense, if you think about someone taking a test today 00:00:57.030 --> 00:01:01.210 and taking the same test tomorrow assuming there is no practice effect 00:01:01.210 --> 00:01:04.850 and that their language hadn't improve considerably overnight, 00:01:04.850 --> 00:01:08.110 their score should be pretty much the same, okay. 00:01:08.110 --> 00:01:12.420 So in the well-designed test, that's one thing we're trying to achieve. 00:01:12.420 --> 00:01:22.220 In terms of classroom assessment, we want to think about first what we want to assess 00:01:22.220 --> 00:01:25.990 and then find the best ways to gather that evidence 00:01:25.990 --> 00:01:30.610 so that we have some validity in our assessment practices. 00:01:30.610 --> 00:01:36.560 At the same time, we're trying to collect evidence in a consistent way, okay. 00:01:36.560 --> 00:01:41.470 And in a way that will limit, delimit the factors that aren't language ability 00:01:41.470 --> 00:01:44.610 or what we want to assess and that is the reliability, okay. 00:01:44.610 --> 00:01:50.010 Now reliability-- sometimes we talk about internal reliability which is in the test. 00:01:50.010 --> 00:01:53.460 So, if you have a long test with lots of items, there are various ways to look 00:01:53.460 --> 00:01:56.350 at the reliability of those items. 00:01:56.350 --> 00:02:01.750 But what we see in classroom assessment in terms of reliability often has more to do 00:02:01.750 --> 00:02:06.810 with the rubrics that we use, the kind of test tasks that we design, 00:02:06.810 --> 00:02:11.160 theconditions that the students have like how much time to work on it and if they're working 00:02:11.160 --> 00:02:13.620 with other people, things that students often see 00:02:13.620 --> 00:02:17.250 as fairness actually do tie in to reliability.