+ - 0:00:00
Notes for current slide
Notes for next slide

Evaluating Content-Related Validity Evidence Using Text Modeling

Daniel Anderson

Brock Rowley

Sondra Stegenga

P. Shawn Irvin

Joshua M. Rosenberg

1 / 17

Background

2 / 17

Background

  • One five major sources of validity evidence (as outlined by the Standards)
2 / 17

Background

  • One five major sources of validity evidence (as outlined by the Standards)

  • Does the content represented in the test represent the targeted content?

    • Are specific areas missing?

    • Are specific areas over-represented?

2 / 17

Background

  • One five major sources of validity evidence (as outlined by the Standards)

  • Does the content represented in the test represent the targeted content?

    • Are specific areas missing?

    • Are specific areas over-represented?

  • Operationally, evidence is often gathered through alignment studies.

    • Judgments made by panels of experts (educators).

    • Does the test items align with the content standards?

2 / 17

Study purpose

Extend content-related validity evidence through the use of text mining

3 / 17

Study purpose

Extend content-related validity evidence through the use of text mining

  • What thematic topics are represented in the content standards?
3 / 17

Study purpose

Extend content-related validity evidence through the use of text mining

  • What thematic topics are represented in the content standards?

  • How do individual items map on to these topics (if at all)?

3 / 17

Study purpose

Extend content-related validity evidence through the use of text mining

  • What thematic topics are represented in the content standards?

  • How do individual items map on to these topics (if at all)?

  • What is the overall coverage of the topics across test items?

3 / 17

Topic modeling

  • Corpus of words split into documents
4 / 17

Topic modeling

  • Corpus of words split into documents

    • We treat each content standard as a document
4 / 17

Topic modeling

  • Corpus of words split into documents

    • We treat each content standard as a document
  • Latent variables (topics) estimated from word co-occurrence

4 / 17

Topic modeling

  • Corpus of words split into documents

    • We treat each content standard as a document
  • Latent variables (topics) estimated from word co-occurrence

    • Number of topics estimated is determined by the researcher (similar to exploratory factor analysis)
4 / 17

Topic modeling

  • Corpus of words split into documents

    • We treat each content standard as a document
  • Latent variables (topics) estimated from word co-occurrence

    • Number of topics estimated is determined by the researcher (similar to exploratory factor analysis)
  • Each document is a mixture of topics

    • γ estimates provide probability a given topic is represented within a document
  • Each topic is a mixture of words

    • β estimates provide probability a given word is represented within a topic
4 / 17

The fundamental idea

5 / 17

The fundamental idea

  1. Train a model on the content standards to estimate the latent topics represented therein.
5 / 17

The fundamental idea

  1. Train a model on the content standards to estimate the latent topics represented therein.

  2. Apply the model to the test items to estimate which topics the items represent (based on the text within the item).

5 / 17

Our application

  • Science NGSS Performance Expectations

  • Grade 8 statewide Alternate Assessment based on Alternate Achievement Standards (AA-AAS)

    • Designed for students with the most significant cognitive disabilities.

    • 1% reporting cap

    • Reduced in depth, breadth, and complexity

6 / 17

Analyses

  • Topics estimated using Latent Dirichlet Allocation
    • Common stop words removed ("and", "of", "the", etc.)
    • Webb's DOK verbs removed ("choose", "describe", "find")
7 / 17

Analyses

  • Topics estimated using Latent Dirichlet Allocation

    • Common stop words removed ("and", "of", "the", etc.)
    • Webb's DOK verbs removed ("choose", "describe", "find")
  • Four methods evaluated to determine optimal number of topics (2-25 evaluated)

    • Arun et al. (2010): KL-Divergence
    • Cao et al. (2009): Cosine similarity
    • Deveaud et al. (2014): Jensen Shannon distance
    • Griffiths & Sayers (2004): harmonic mean of posterior log-likelihoods
7 / 17

Analyses

  • Topics estimated using Latent Dirichlet Allocation

    • Common stop words removed ("and", "of", "the", etc.)
    • Webb's DOK verbs removed ("choose", "describe", "find")
  • Four methods evaluated to determine optimal number of topics (2-25 evaluated)

    • Arun et al. (2010): KL-Divergence
    • Cao et al. (2009): Cosine similarity
    • Deveaud et al. (2014): Jensen Shannon distance
    • Griffiths & Sayers (2004): harmonic mean of posterior log-likelihoods
  • Smaller range of topics evaluated by two science content experts for substantive meaning

7 / 17

Results: n topics

  • 3-6 topics evaluated for substantive meaning

  • 5-topic solution independently arrived upon

    • Distinct topics, little redundancy
8 / 17

Topics


Topic Substantive Label
1 Analyzing data and using evidence to understand organisms and systems
2 Using scientific evidence to understand Earth systems
3 Energy
4 Genetic information
5 Scientific solutions
9 / 17

Mapping topics

to standards

  • Most standards represented by a single topic
10 / 17

Mapping words

to topics

  • Top 15 words displayed
11 / 17

Predicting

items

to topics

  • Nine random items displayed
12 / 17

Topic coverage

13 / 17

Discussion

  • Content validity is a critical component of "overall evaluative judgment" (Messick, 1995) of the validity of test-based inferences
14 / 17

Discussion

  • Content validity is a critical component of "overall evaluative judgment" (Messick, 1995) of the validity of test-based inferences

  • Particularly important within standards-based educational systems

14 / 17

Discussion

  • Content validity is a critical component of "overall evaluative judgment" (Messick, 1995) of the validity of test-based inferences

  • Particularly important within standards-based educational systems

  • Text-modeling may serve as additional source of evidence (triangulation)

14 / 17

Discussion

  • Content validity is a critical component of "overall evaluative judgment" (Messick, 1995) of the validity of test-based inferences

  • Particularly important within standards-based educational systems

  • Text-modeling may serve as additional source of evidence (triangulation)

  • May be useful as a diagnostic tool

14 / 17

Limitations & Future Directions

  • Results depend upon chosen topic model - different models may lead to different inferences
15 / 17

Limitations & Future Directions

  • Results depend upon chosen topic model - different models may lead to different inferences

  • Our model is preliminary, but publicly available. Consensus from field could help inform models that are useful and provide better validity evidence.

15 / 17

Limitations & Future Directions

  • Results depend upon chosen topic model - different models may lead to different inferences

  • Our model is preliminary, but publicly available. Consensus from field could help inform models that are useful and provide better validity evidence.

  • Our application was in Science with an AA-AAS

    • Generalizability to other content areas/tests is not known
15 / 17

Limitations & Future Directions

  • Results depend upon chosen topic model - different models may lead to different inferences

  • Our model is preliminary, but publicly available. Consensus from field could help inform models that are useful and provide better validity evidence.

  • Our application was in Science with an AA-AAS

    • Generalizability to other content areas/tests is not known
  • What if an item has no text?

    • Text-modeling could perhaps be used to help "flag" items for further investigation

    • Alternative ML procedures (e.g., image recognition) may help

15 / 17

Conclusions

  • Text mining procedures may provide additional source of evidence

    • Perhaps supplementing formal alignment studies
  • Evidence could be used diagnostically

  • Topic modeling itself may be useful in understanding the topics represented in either the standards or a given test, indpendent of the linkage between the two.

16 / 17

Background

2 / 17
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow