One five major sources of validity evidence (as outlined by the Standards)
Does the content represented in the test represent the targeted content?
Are specific areas missing?
Are specific areas over-represented?
One five major sources of validity evidence (as outlined by the Standards)
Does the content represented in the test represent the targeted content?
Are specific areas missing?
Are specific areas over-represented?
Operationally, evidence is often gathered through alignment studies.
Judgments made by panels of experts (educators).
Does the test items align with the content standards?
Extend content-related validity evidence through the use of text mining
Extend content-related validity evidence through the use of text mining
Extend content-related validity evidence through the use of text mining
What thematic topics are represented in the content standards?
How do individual items map on to these topics (if at all)?
Extend content-related validity evidence through the use of text mining
What thematic topics are represented in the content standards?
How do individual items map on to these topics (if at all)?
What is the overall coverage of the topics across test items?
Corpus of words split into documents
Corpus of words split into documents
Latent variables (topics) estimated from word co-occurrence
Corpus of words split into documents
Latent variables (topics) estimated from word co-occurrence
Corpus of words split into documents
Latent variables (topics) estimated from word co-occurrence
Each document is a mixture of topics
Each topic is a mixture of words
The fundamental idea
The fundamental idea
The fundamental idea
Train a model on the content standards to estimate the latent topics represented therein.
Apply the model to the test items to estimate which topics the items represent (based on the text within the item).
Science NGSS Performance Expectations
Grade 8 statewide Alternate Assessment based on Alternate Achievement Standards (AA-AAS)
Designed for students with the most significant cognitive disabilities.
1% reporting cap
Reduced in depth, breadth, and complexity
Topics estimated using Latent Dirichlet Allocation
Four methods evaluated to determine optimal number of topics (2-25 evaluated)
Topics estimated using Latent Dirichlet Allocation
Four methods evaluated to determine optimal number of topics (2-25 evaluated)
Smaller range of topics evaluated by two science content experts for substantive meaning
3-6 topics evaluated for substantive meaning
5-topic solution independently arrived upon
Topic | Substantive Label |
---|---|
1 | Analyzing data and using evidence to understand organisms and systems |
2 | Using scientific evidence to understand Earth systems |
3 | Energy |
4 | Genetic information |
5 | Scientific solutions |
Content validity is a critical component of "overall evaluative judgment" (Messick, 1995) of the validity of test-based inferences
Particularly important within standards-based educational systems
Content validity is a critical component of "overall evaluative judgment" (Messick, 1995) of the validity of test-based inferences
Particularly important within standards-based educational systems
Text-modeling may serve as additional source of evidence (triangulation)
Content validity is a critical component of "overall evaluative judgment" (Messick, 1995) of the validity of test-based inferences
Particularly important within standards-based educational systems
Text-modeling may serve as additional source of evidence (triangulation)
May be useful as a diagnostic tool
Results depend upon chosen topic model - different models may lead to different inferences
Our model is preliminary, but publicly available. Consensus from field could help inform models that are useful and provide better validity evidence.
Results depend upon chosen topic model - different models may lead to different inferences
Our model is preliminary, but publicly available. Consensus from field could help inform models that are useful and provide better validity evidence.
Our application was in Science with an AA-AAS
Results depend upon chosen topic model - different models may lead to different inferences
Our model is preliminary, but publicly available. Consensus from field could help inform models that are useful and provide better validity evidence.
Our application was in Science with an AA-AAS
What if an item has no text?
Text-modeling could perhaps be used to help "flag" items for further investigation
Alternative ML procedures (e.g., image recognition) may help
Text mining procedures may provide additional source of evidence
Evidence could be used diagnostically
Topic modeling itself may be useful in understanding the topics represented in either the standards or a given test, indpendent of the linkage between the two.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |