Item Design Notes

How assessment items translate constructs into answerable questions, and why multiple items, similar items, and boundaries matter.

Source
science-contentpage-en-review-draft-2026-06-09/pages/02-item-design-content-en-01.md

How an Idea Becomes a Question

The central challenge in item design is turning an abstract idea into a question that a user can answer. Concepts such as career interest, conscientiousness, or information preference are not things a user can directly see. Assessment needs concrete items that let users observe their responses in certain tasks, situations, or preferences.

This process can be understood as first defining the construct to be observed, then designing a group of items that cover different sides of that construct. Items are not meant to read minds. They collect a user's self-description in situations that are understandable enough to answer.

Why a Single Item Is Not Enough

A single item is easily affected by interpretation bias, mood, and temporary experience. A user may answer one item differently because of something that happened recently, even if that answer does not reflect a longer-term tendency. For that reason, an assessment should usually not rely on one item for interpretation.

A more stable approach is to design a group of items around the same dimension. Different items observe a similar tendency from different situations, and the final interpretation looks at the overall pattern. This still cannot remove all error, but it is more careful than single-item judgment.

Why Similar and Reverse-Scored Items May Appear

Users sometimes feel that certain items are asking similar things. This may happen because the items are observing the same dimension from different situations. For example, "I like to plan ahead" and "disorganized processes drain me" may both relate to order and planning, but the first is closer to an active preference and the second to a situational response.

Some assessments may also use items written in the opposite direction to reduce mechanical answering or one-way response patterns. Whether reverse-scored items are used, how they are scored, and how consistency is checked should depend on the specific item-bank design. Current public documentation does not provide the complete FermatMind item-bank structure.

How Response Options Shape Results

An item includes not only text but also response options. Whether the options are forced choice, a five-point scale, an agreement scale, or a situational choice affects how users express themselves. Too few options may compress differences. Too many options may increase hesitation.

When answering, users do not need to search for the "correct" answer. A better approach is to answer according to their usual state, rather than the ideal self or the image others may expect.

Response Bias Must Be Acknowledged

Assessment cannot fully remove response bias. Users may present themselves more positively, aim for a certain result, be affected by mood, or interpret item language differently. Item design can try to reduce these issues, but it cannot guarantee that they disappear.

Result interpretation should therefore preserve boundaries: it reflects the pattern under the current items and current responses, not a final judgment about the person.

Unknown Fields in Current Public Documentation

If item-bank version, item count, item development process, sample information, or validation data are not publicly available, the page should not claim that item design has been validated in a specific way. Where current public documentation does not provide these numbers, they should remain Unknown.

Related method pages include /science and /method-boundaries. For reliability and validity, see /reliability-validity.

visible_faq_items: Why do some items look similar?

They may observe the same kind of tendency from different situations. Similarity is not automatically duplication, but the item-bank design must justify why each item is needed.

Should I answer as I am or as I want to be?

Answer according to your usual state. Answering as an ideal self may make the result look more like a goal image than a current observable tendency.

Are reverse-scored items trying to catch me lying?

They should not be understood simply as lie detection. Reverse-scored items may help observe consistency or reduce mechanical answering, but their exact use depends on the item design.

Does a longer assessment automatically mean a better result?

Not necessarily. Item count, item quality, model design, and response fatigue all affect results. More items do not automatically mean better measurement.

Does FermatMind publish item-bank validation data?

Current public documentation does not provide complete item-bank validation numbers. Any specific data would need science review before publication.