101010.pl is one of the many independent Mastodon servers you can use to participate in the fediverse.
101010.pl czyli najstarszy polski serwer Mastodon. Posiadamy wpisy do 2048 znaków.

Server stats:

508
active users

#psychometrics

0 posts0 participants0 posts today

We know that the task demands of cognitive tests most scores: if one version of a problem requires more work (e.g., gratuitously verbose or unclear wording, open response rather than multiple choice), people will perform worse.

Now we have observed as much in Large Language Models: doi.org/10.48550/arXiv.2404.02

The tests included analogical reasoning, reflective reasoning, word prediction, and grammaticality judgments.

Michal Jaskolski and I are excited to invite you to a unique hackathon research project at Brainhack Warsaw (15-17 Mar 2024), exploring the intriguing intersection of AI and human psychology:
🧠 "Can AI diagnose mental conditions within a single conversation?"

In 2013, simpler machine learning models could predict personality from as few as 10 Facebook likes, more accurately than a work colleague, and with 300 likes, better than a spouse. We seek to discover the extent of advancements possible with far more sophisticated AI and richer data.

brainhackwarsaw.fuw.edu.pl/

Brainhack Warsaw 2025Brainhack Warsaw 20252025 International hackaton related to neuroscience and technology, 28-30th March 2025, Brainhack will take place at the University of Warsaw, Faculty of Physics, Pasteura 5, 02-093 Warsaw, Poland.

New paper out on the Perceived Stress Scale! The PSS is a very commonly (mis)used measure, and our analysis clearly shows the need for changing how it is used.
doi.org/10.1186/s12888-023-051

#openaccess #openscience #opendata #psychometrics #rasch #measurement #stress

First, only the negative items actually assess stress and work together as a unidimensional scale. This means a shorter scale, 7 items from the PSS-14 (6 items from PSS-10).

1/2

BioMed CentralA psychometric evaluation of the Swedish translation of the Perceived Stress Scale: a Rasch analysis - BMC PsychiatryBackground Stress reflects physical and psychological reactions to imposing demands and is often measured using self-reports. A widely-used instrument is the Perceived Stress Scale (PSS), intended to capture more general aspects of stress. A Swedish translation of the PSS is available but has not previously been examined using modern test theory approaches. The aim of the current study is to apply Rasch analysis to further the understanding of the PSS’ measurement properties, and, in turn, improve its utility in different settings. Methods Data from 793 university students was used to investigate the dimensionality of different version of the PSS (14, 10, and 4 items) as well as potential response patterns among the participants. Results The current study demonstrates that the PSS-14 has two separate factors, divided between negatively worded items (perceived stress) and positively worded items (perceived [lack of] control), although with only the negative subscale exhibiting good reliability. Response patterns were analyzed using Differential Item Functioning, which did not find an influence of gender on any of the items, but for age regarding the positive subscale (items 6 and 9). The PSS-10 also demonstrated adequate reliability for the negative subscale, but the PSS-4 was not deemed suitable as a unidimensional scale. Conclusions Based on the results, none of the versions of the PSS should be used by sum-scoring all of the items. Only the negative items from the PSS-14 or PSS-10 can be used as unidimensional scales to measure general aspects of stress. As for different response patterns, gender may nevertheless be important to consider, as prior research has found differences on several items. Meanwhile, content validity is discussed, questioning the relevance of anger and being upset when measuring more general aspects of stress. Finally, a table to convert the PSS-7 (i.e., negative items) ordinal sum scores to interval level scores is provided.

New academic year - new #introduction

I am interested in health-related quality of life
#HRQL #Psychometrics
and my account mainly reflects what I work on, e.g. also

#ResearchMethods modules in the #DundeeUni #ProfDoc
#Interdisciplinary #RD62001 #RD62002

I convene our school's #ResearchEthics committee

My role as academic editor of 'Quality of Life Research' is coming to an end, and I am mulling over whether I am looking for a new role in #AcademicPublishing replacing it
#NightshiftEditor

Replied in thread

@TomJewell

On my way out, links to two texts that I think are very useful for researchers interested in exploring and applying some of these arguments to #HRQL research and practice:

Fit for purpose and modern validity theory in clinical outcomes assessment
rdcu.be/dgE92

Constructing arguments for the interpretation and use of patient-reported outcome measures in research
rdcu.be/dgFae

And with that I am on my way 😉

Replied in thread

@TomJewell

This is all not new and relates to fundamental ideas of #sampling in #Statistics, #Epidemiology, and #Norming in psychometric applications.

In this educational piece I co-wrote with Edwin de Beurs and @admin we refer to many of these principles and the importance of the sampling process -- and more importantly, many of the older references we pulled together are good places do start reading up on the historical treatment of this topic:
onlinelibrary.wiley.com/doi/10

Replied in thread

@TomJewell

We have argued that the definition of the target population, #sampling procedure, and #ItemContent of an instrument jointly define the meaning of measures in particular applications (#PsychometricEpidemiology rdcu.be/dgE5g).

Written from the perspective that we were perplexed by psychologists being surprised that psychometric results change when population and sampling procedure are changed, I think it is an illustration of the above from the other side.

link.springer.comFactors of psychological distress: clinical value, measurement substance, and methodological artefacts
Replied in thread

@TomJewell

There will be plenty of other examples, this is just my particular pocket and the time I could spend on this. There is some more...

5) _Sampling and #RangeRestriction_

Language in the "Standards" often points to range restriction as a reason for changes in the criteria.

This has particular consequences in the case of #FormativeMeasurement discussed in detail here:
journals.sagepub.com/doi/10.11

And w that we move to the combination of #sampling, #epidemiology, and #Psychometrics.

Replied in thread

@TomJewell

I argue that there is some understanding in the #psychometrics and professional communities that use #TestScores that evidence of quality is tentative as any statistical result.

And in my personal view, the different methodologies #COSMIN has developed (e.g., link.springer.com/article/10.1) or is currently pushing forward (doi.org/10.1186/s13643-022-019) are tools to aggregate available evidence exactly for this reason: each individual study and estimate offers an incomplete picture.

SpringerLinkCOSMIN guideline for systematic reviews of patient-reported outcome measures - Quality of Life ResearchPurpose Systematic reviews of patient-reported outcome measures (PROMs) differ from reviews of interventions and diagnostic test accuracy studies and are complex. In fact, conducting a review of one or more PROMs comprises of multiple reviews (i.e., one review for each measurement property of each PROM). In the absence of guidance specifically designed for reviews on measurement properties, our aim was to develop a guideline for conducting systematic reviews of PROMs. Methods Based on literature reviews and expert opinions, and in concordance with existing guidelines, the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) steering committee developed a guideline for systematic reviews of PROMs. Results A consecutive ten-step procedure for conducting a systematic review of PROMs is proposed. Steps 1–4 concern preparing and performing the literature search, and selecting relevant studies. Steps 5–8 concern the evaluation of the quality of the eligible studies, the measurement properties, and the interpretability and feasibility aspects. Steps 9 and 10 concern formulating recommendations and reporting the systematic review. Conclusions The COSMIN guideline for systematic reviews of PROMs includes methodology to combine the methodological quality of studies on measurement properties with the quality of the PROM itself (i.e., its measurement properties). This enables reviewers to draw transparent conclusions and making evidence-based recommendations on the quality of PROMs, and supports the evidence-based selection of PROMs for use in research and in clinical practice.
Replied in thread

@TomJewell

Their "#Guidelines on Test Use" stress to
"Choose technically sound tests appropriate for the situation" and they offer a fairly extensive definition of what a "test" is, which includes settings, procedures, and other parameters to consider.

They also stress
"...evidence of reliability and validity for their intended purpose"
and that evidence should
"...support the inferences that may be drawn from the scores on the test".

intestcom.org/page/15

www.intestcom.orgInternational Test Commission
Replied in thread

@TomJewell

To which degree) the term "validity" includes all quality criteria and qualities one can consider to describe the use of #TestScores is a matter of heated discussion in fields related to the philosophy of measurement. But it is probably safe to assume this for the stuff covered in the "Standards".

For a look into the discussion I recommend Kathleen Slaney's book on Construct Validity (link.springer.com/book/10.1057) and several of her papers deal with this topic.

SpringerLinkValidating Psychological Constructs
Replied in thread

@TomJewell

3) _It's about #TestScores_

Not always discussed explicitly in relation to all quality indicators and most frequently in the relation to "#validity", the process of validation produces evidence for the validity of #TestScores (as opposed to the validity of the test!).

The "Standards" make that very clear for the term "Validity" [2014, p. 11]:
"...the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests"

Replied in thread

@TomJewell ... [2014, p. 97] "Renorming may be required to maintain the validity of norm-referenced #TestScore interpretations."

Many more places in the "Standards" are likely related to the overall argument

But this should suffice to illustrate the idea that professional test use acknowledges that a psychometric indicator is estimated on a sample from a population and the same epistemological questions apply as to any other place where we use estimates from samples.

Replied in thread

@TomJewell

2) _Professional Standards_
I looked into the AERA-APA-NCME "Standards" (2014)* whether this is considered in practice.

Standard [1.8] for example explicitly considers that "[s]tatistical findings can be influenced by factors affecting the sample" and discuss how evidence for validity may therefore differ by context.

[1.10] highlights that reporting should be detailed enough that applicants can judge relevance of results for their setting.

#Psychometrics
*en.wikipedia.org/wiki/Standard

en.wikipedia.orgStandards for Educational and Psychological Testing - Wikipedia