I argue that there is some understanding in the #psychometrics and professional communities that use #TestScores that evidence of quality is tentative as any statistical result.
And in my personal view, the different methodologies #COSMIN has developed (e.g., https://link.springer.com/article/10.1007/s11136-018-1798-3) or is currently pushing forward (https://doi.org/10.1186/s13643-022-01994-5) are tools to aggregate available evidence exactly for this reason: each individual study and estimate offers an incomplete picture.