WEBVTT 00:00:00.880 --> 00:00:11.310 >> So we're going to talk about 3 terms I wanna think about: Measures, evidence, and inference, 00:00:11.310 --> 00:00:16.160 okay, and these are 3 terms we're gonna use to talk about what in language testing. 00:00:16.160 --> 00:00:20.620 So-- but let's leave language testing for a moment, okay. 00:00:20.620 --> 00:00:23.430 And I'm gonna ask you some questions. 00:00:23.430 --> 00:00:30.190 So the first question is how do you know what size shoe to buy when you go to the shoe store? 00:00:30.190 --> 00:00:37.490 You try them on, okay, but do you start with size 1 and then go to 2 and then-- 00:00:37.490 --> 00:00:37.870 [ Laughter ] 00:00:37.870 --> 00:00:39.620 >> You already know. 00:00:39.620 --> 00:00:44.870 You already know what size shoe you wear, usually. 00:00:44.870 --> 00:00:45.660 >> Okay, okay, right. 00:00:45.660 --> 00:00:48.930 Yeah, at our age, we're-- our shoe size is pretty consistent. 00:00:48.930 --> 00:00:51.850 Do you remember when you were a kid and every time you went to buy shoes-- 00:00:51.850 --> 00:00:52.500 [ Inaudible Remark ] 00:00:52.500 --> 00:00:55.080 >> You had to get a different size. 00:00:55.080 --> 00:00:56.770 How did you know that? 00:00:56.770 --> 00:00:57.890 >> You use a ruler. 00:00:57.890 --> 00:01:00.500 >> There was some kind of a ruler, right? 00:01:00.500 --> 00:01:05.360 Did everybody experience this somebody measuring your foot? 00:01:05.360 --> 00:01:08.710 [Laughter] Did anybody experienced anything different? 00:01:08.710 --> 00:01:12.210 >> We just always trying them on. 00:01:12.210 --> 00:01:13.540 >> Trying them on, okay. 00:01:13.540 --> 00:01:15.710 Okay, so there's sort of trial and error. 00:01:15.710 --> 00:01:19.610 But with shoe size there is actually, you know, a ruler that we can use and say, 00:01:19.610 --> 00:01:26.480 "This is how big my foot is and I can transfer that into a number of 44 at 8 or whatever", 00:01:26.480 --> 00:01:33.980 and it's a very clear measurement based on a rule, a measuring rule. 00:01:33.980 --> 00:01:37.230 Okay, next question, how do you know when you're hungry? 00:01:37.230 --> 00:01:38.410 [ Simultaneously Talking ] 00:01:38.410 --> 00:01:44.390 >> Your stomach, and so there's noise, there's audio. 00:01:44.390 --> 00:01:47.280 Your stomach starts growling. 00:01:47.280 --> 00:01:48.000 Anything else? 00:01:48.000 --> 00:01:48.400 [ Inaudible Remark ] 00:01:48.400 --> 00:01:49.660 >> The time. 00:01:49.660 --> 00:01:52.670 Yes, I'm very time oriented with my meals. 00:01:52.670 --> 00:01:55.960 So yeah, it's 12 o'clock I must be hungry. 00:01:55.960 --> 00:01:56.540 [Laughter] Other things? 00:01:56.540 --> 00:01:56.610 [ Inaudible Remark ] 00:01:56.610 --> 00:01:56.990 >> Huh? 00:01:56.990 --> 00:01:59.680 >> I can't work. 00:01:59.680 --> 00:02:02.960 >> You can't work, so you start to lose concentration and focus. 00:02:02.960 --> 00:02:04.440 >> I get grumpy. 00:02:04.440 --> 00:02:06.470 >> Grumpy, okay. 00:02:06.470 --> 00:02:08.980 So, short-tempered, any other things? 00:02:08.980 --> 00:02:12.350 >> Other people already have lunch and then-- 00:02:12.350 --> 00:02:16.700 >> Okay, you see people eating and you think it you must be, you must be hungry too, okay. 00:02:16.700 --> 00:02:22.190 Now all of these things are not direct measurements, right? 00:02:22.190 --> 00:02:25.050 You're not looking in your stomach and seeing, "Oh, it's pretty low". 00:02:25.050 --> 00:02:28.910 [Laughter] I don't know actually physiologically what you'd be looking for, 00:02:28.910 --> 00:02:35.480 but you realize that you're hungry not from directly measuring something. 00:02:35.480 --> 00:02:39.920 But over the years, you've collected evidence like your stomach growling, 00:02:39.920 --> 00:02:45.550 lack of concentration, the time, and you know that those things-- 00:02:45.550 --> 00:02:50.910 from those things you can infer that you are probably hungry and should eat, okay. 00:02:50.910 --> 00:02:53.420 So, very different from shoe size with this kind of inference. 00:02:53.420 --> 00:02:59.490 And if we think about language testing a bit like the second example knowing you're hungry, 00:02:59.490 --> 00:03:04.430 this helps us, I think, think about issues like validity in testing, okay. 00:03:04.430 --> 00:03:09.670 So how do you know if your students can have a successful conversation in Spanish? 00:03:09.670 --> 00:03:10.210 [ Inaudible Remark ] 00:03:10.210 --> 00:03:14.940 >> We can't measure it directly. 00:03:14.940 --> 00:03:21.190 >> But you get the idea from your classroom experience interaction. 00:03:21.190 --> 00:03:22.900 >> You're collecting evidence, aren't you? 00:03:22.900 --> 00:03:25.090 Yes. So, you have interaction with the student. 00:03:25.090 --> 00:03:29.190 You see them do some performances, maybe there's some assessments that you do, 00:03:29.190 --> 00:03:34.740 and you're collecting evidence that you're gonna use to make inferences about their ability 00:03:34.740 --> 00:03:36.650 to have a conversation in Spanish. 00:03:36.650 --> 00:03:40.940 So you can't necessarily go with them to Mexico and see how that all works out. 00:03:40.940 --> 00:03:47.610 But you're building evidence to make inferences about language ability, okay. 00:03:47.610 --> 00:03:50.180 And that is really what validity is. 00:03:50.180 --> 00:03:53.010 You hear the term validity used a lot, that's what it is. 00:03:53.010 --> 00:03:57.850 It's how we collect evidence and then try to make inferences 00:03:57.850 --> 00:04:00.540 about what we want to measure, okay. 00:04:00.540 --> 00:04:05.070 And so those 3 terms are kind of important to think about with validity. 00:04:05.070 --> 00:04:10.330 And what we hope to do is collect the best evidence. 00:04:10.330 --> 00:04:15.680 So if we're using tests as a kind of-- performance on test as a kind of evidence, 00:04:15.680 --> 00:04:21.670 we want to have the best evidence for what were trying to measure. 00:04:21.670 --> 00:04:23.890 So we have to know what we wanna measure, okay. 00:04:23.890 --> 00:04:26.040 So there's all this back and forth. 00:04:26.040 --> 00:04:30.540 And then we need to make inferences from that evidence that are appropriate 00:04:30.540 --> 00:04:33.280 so that's sort of another step, okay. 00:04:33.280 --> 00:04:38.480 So this is what I wanna measure, I collect information, 00:04:38.480 --> 00:04:44.210 evidence of what I think will help me make inferences and then I make inferences 00:04:44.210 --> 00:04:46.090 about what I wanna measure, okay. 00:04:46.090 --> 00:04:52.480 So it's the appropriateness of those inferences that you make are part of validity, okay. 00:04:52.480 --> 00:04:58.180 And so keeping that in mind, that test itself is not valid or invalid, we do talk about, oh, 00:04:58.180 --> 00:05:01.100 this test isn't invalid or this is a very highly valid test. 00:05:01.100 --> 00:05:06.880 But in fact, the test itself really isn't the key to validity, it's the use of the test, okay. 00:05:06.880 --> 00:05:08.670 And therein lies the validity. 00:05:08.670 --> 00:05:13.550 So for example we looked at fill in the blank-type questions. 00:05:13.550 --> 00:05:16.130 So you, imagine you have a test where there are sentences 00:05:16.130 --> 00:05:25.400 and students are putting correct verb forms in those blanks, if I were using that as a measure 00:05:25.400 --> 00:05:32.040 or as evidence for writing ability, for example, that would be a problem for validity, right? 00:05:32.040 --> 00:05:35.980 Because the evidence wouldn't be very useful for me to make inferences 00:05:35.980 --> 00:05:39.540 about whether a student could write a paragraph or something. 00:05:39.540 --> 00:05:43.150 If I were trying to make inferences about their ability 00:05:43.150 --> 00:05:48.730 to conjugate a verb then it might be a more valid use for that kind of a test.