OpenTag.Studio — Research

THEME & GSEQ Setup Guide

A practical, end-to-end guide for designing an observational study that flows cleanly into lag sequential analysis (GSEQ) and T-pattern detection (THEME). It bridges the gap the methodology papers assume away: how to set up a code window so the export actually produces meaningful sequential data.

1. What is sequential analysis — and why does data format matter?

Lag sequential analysis (GSEQ) tests whether behaviour B follows behaviour A significantly more (or less) often than chance, at lag +1, +2, +3 and so on. The key statistic is the adjusted residual (z > 1.96 = significant positive association; z < −1.96 = significant inhibition). It needs events coded as a time-ordered sequence. Reference: Bakeman & Quera (2011).

T-pattern detection (THEME) finds recurring temporal structures — chains of events that consistently occur in the same order within relatively invariant time windows. Its defining concept is the critical interval [t+d1, t+d2]: after event A at time t, this interval contains event B more often than chance (Magnusson 1996, 2000). Unlike GSEQ, THEME uses real timestamps, so the timing matters, not just the order.

They converge. Lapresa et al. (2013) analysed the same data with both methods and found every GSEQ sequential pattern had a corresponding THEME T-pattern. The two are complementary, not competing — GSEQ gives statistical significance for transitions; THEME gives the temporal structure of the same patterns. Using both on one dataset is explicitly validated.

Why your code-window design decides what analysis is possible

In Sportscode, only code-button presses generate timestamped instances with their own start time, duration and CSV row. Label-group selections do not — they are attributes of the enclosing button press and share its timestamp. This is the single most consequential decision: if you capture behavioural dimensions as label groups, all dimensions within one press share one timestamp and cannot be analysed as individually timed sequential events.

OpenTag works the same way: each event is a timestamped instance; attributes are descriptors that share the event's time. Coding each action as its own event gives you genuine inter-action timing; bundling actions as attributes on one event does not.

2. Understanding your data type

Before choosing a design, decide which of two published, valid paradigms fits your research question.

Paradigm 1 — Multicode eventParadigm 2 — Real-time event/state
UnitA complete episode (e.g. one attacking sequence)Each individual action within an exchange
TimestampsNone — ordinal order onlyReal — video time of each action
DurationNone — a constant is assignedReal — actual duration per event
T-pattern meaningOrdinal / sequential structureGenuine temporal structure (real critical intervals)
Validated byLapresa et al. (2013)Borrie (2002), Camerino (2012), Diana (2017)
Converter inputLabel-based CSV / single event + attributesAll-code-buttons CSV / one event per action

If you only want to know what kind of episode follows what, Paradigm 1 is enough and you can assign a constant duration (Section 8 / the converter's constant-duration option). If you want the temporal structure of action–reaction chains, you need Paradigm 2 — one timestamped event per action.

3. Three code-window designs

Design 1 — Label-based single button Paradigm 1

One code button (e.g. Actor) pressed once per episode. Label groups capture the dimensions: Attack Initiator, Key Technique, Opponent Reaction, Outcome. The converter joins the dimension values into one code, e.g. CutKick_NaeryeoAxe_GuardRaises_Counter3Pt.

Analysis: GSEQ tells you which episode types follow each other; THEME finds which episode types cluster in time. Individual action timing within an episode is not recoverable.

Be honest about its limit. "Opponent Reaction" and "Key Technique" may belong to different fighters, and the instrument can't say who did what within the episode. Name the group Key Technique, not Follow-up Action.

Design 2 — All code buttons Paradigm 2

Every action type has its own code button; boundary buttons (Start, End) delimit sequences; no label groups.

Start                       — referee reset / whistle
P1_CutKick, P1_NaeryeoAxe…  — Player 1 techniques
P2_CutKick, P2_NaeryeoAxe…  — Player 2 techniques
P1_GuardRaises, P2_Retreat… — reactions
P1_Win_1Pt … P2_Win_5Pt     — scoring
NoScore                     — reset, no point
End                         — whistle after resolution

The button name is the complete event code — no labels, no joining. The converter reads each row as a code and splits on Start/End into one GSEQ session per sequence. THEME receives genuine timed events. This is the recommended design for new studies about the sequential/temporal structure of individual actions.

Design 3 — Hybrid code buttons + labels

Code buttons for primary actions (P1Attack, P2Attack, P1Win, Start, End), with label groups on the attack buttons for technique and reaction. Each press gets one timestamp; technique/reaction labels share it, but between-action timing is real. Use when there are too many technique combinations to warrant individual buttons but you still need per-action timestamps.

In OpenTag: Design 2 = one event per action. Design 1/3 = one event (e.g. an Activator attribute) per episode with attributes for the dimensions. The converter reads OpenTag JSON/CSV directly, so either maps straight through.

4. Naming rules (all designs)

These apply to every button name and label value. Violations produce sanitised codes that are unreadable or that collide.

RuleProblemFix
No spacesCut KickCutKickDesign names that read without spaces
No hyphensActor-2ptActor2ptActor2Pt, GamjeomMinus
No bracketsNaeryeo(Axe)NaeryeoAxeWrite NaeryeoAxe directly
No commas in a valueDollyo, Sewo = one tokenTwo buttons/labels, or a combo name
One dimension per label groupMixed groups → one ambiguous columnOne named group per analytical dimension
No duplicate group namesRenamed group leaves an empty columnDelete the old group before creating the new
THEME reserved tokens. Avoid an event literally named action — it collides with the class declaration in vvt.vvt after lowercasing. Punctuation-only names (&, :, *) sanitise to empty and fall back to Event. The converter's preview panel shows exactly what codes GSEQ and THEME will receive — check it before downloading.

5. Coding protocol & timestamp precision

Designs 1 & 3 (label-based)

Set the Actor button's lead time to 8–10 s so pressing it rewinds to cover the full sequence. Pause, rewind and select labels in any order while the button is held; close the instance on release (or via a deactivation link from the Outcome label group).

Design 2 (all code buttons) — post-hoc frame-stepping

  1. Watch the full sequence through without pressing anything.
  2. Note what happened (techniques, players, outcome).
  3. Rewind using frame-step keys.
  4. Park the video on the exact frame each action begins.
  5. Press the button at that frame; release when the action ends.
  6. Move to the next action and repeat.

Slower (~3–4 min per sequence) but it produces precise timestamps — the precision that makes THEME genuinely temporal rather than merely sequential. In OpenTag, use the frame-step controls and tag each action as its own event at its true frame.

6. Combination techniques

Combinations (two techniques in rapid succession by the same player) are common in combat sports. Three options:

  • A — Ignore structure: code the sequence once, label the primary scoring technique. Loses co-occurrence detail; fine when combinations are rare.
  • B — Composite atomic labels (best for Design 1): add explicit combination labels (AxeDollyo, AxeSewo). Watch a sample first to find all attested combinations before building the window — adding labels mid-project means recoding.
  • C — Two presses, identical window (Design 2 only): press the player's button twice over the same window, once per technique. Equal durations in %SS = lag-0 co-occurrence = correct simultaneity.

7. Sequence boundaries & GSEQ sessions

GSEQ analyses patterns within sessions. Without boundaries, the last event of sequence 17 and the first of sequence 18 look like a lag transition — a meaningless pairing.

  • Design 2: the converter splits each StartEnd block into one GSEQ session automatically — the cleanest approach.
  • Design 1: use the converter's sequence-assignment panel. Auto-detect uses a gap threshold (default 4 s, suited to the taekwondo reset/bow). Review the editable table and override as needed.
  • Post-hoc: if data was collected without boundaries, add a sequence-ID column in your spreadsheet; the converter reads it.

8. Data volume requirements

Adjusted residuals are unreliable when observed frequencies are very low. A widely used working floor is 20 instances per event code (criterion or target).

The converter's frequency audit panel surfaces this before you download — codes under 20 are flagged amber, under 5 red. If a code is too sparse: code more matches, combine rare codes with GSEQ's LUMP command, or redesign with fewer/broader codes.

lump P1Attack = P1_CutKick P1_FeintBody P1_StepIn P1_DirectAttack

Worked example: 3 matches × ~30 sequences × 40 unique codes ≈ 360 events ≈ 9 per code — below the floor. Lump or broaden.

9. Step-by-step: importing into THEME

  1. Open THEME (PatternVision; free for academic use). File → New Project.
  2. File → Import observation → select observation.txt from the converter's THEME ZIP.
  3. File → Import category table → select vvt.vvt from the same ZIP.
  4. Verify the event list shows all codes in lowercase.
  5. Search → T-pattern detection.

Detection settings: minimum occurrences 3–5; significance p = .005 (THEME default, used across the published studies); minimum pattern length 2; disable the "fast" requirement (per Lapresa 2013, so valid slower patterns aren't rejected).

Read the dendrogram: each node = one event; a branch = a temporal relationship (critical interval); longer chains = more complex recurring structures. To generalise across matches, look for patterns recurring in multiple sessions — Diana et al. (2017) used ≥ 80% of matches as the filter.

10. Step-by-step: importing into GSEQ 5

  1. Open GSEQ 5 (Bakeman & Quera — free download). File → Open → the .mds from the converter. (Commercial alternative: GSEQ via Mangold INTERACT.)
  2. GSEQ compiles .mds.mex automatically. Compile errors = illegal characters in code names — check the converter preview.
  3. Type FREQ to see frequencies; flag anything below 20.
  4. Type the table command:
    table CriterionCode given all codes lag 1 to 5
  5. Type STATS to compute adjusted residuals.

Interpreting: z > +1.96 → target follows criterion significantly more than chance (p < .05); z < −1.96 → significant inhibition. Report criterion, target, lag, adjusted residual and direction. Use LUMP (Section 8) before TABLE when individual codes are too sparse.

11. Methods text & citations

A ready-to-adapt methods paragraph:

Lag sequential analysis was conducted using GSEQ 5 (Bakeman & Quera, 2011). Adjusted residuals were the primary statistic, with z > 1.96 (p < .05) indicating a significant positive association and z < −1.96 significant inhibition at the specified lag. Only criterion and target codes with a minimum frequency of [n] occurrences were included.

T-pattern detection was conducted using THEME [version] (Magnusson, 2000). T-patterns are combinations of events occurring in the same order with relatively invariant real-time distances between consecutive components, relative to a null hypothesis of independent random distribution (Magnusson, 1996). Detection parameters were: minimum occurrences ≥ [n], significance p = .005, fast requirement disabled. [If no duration: following Lapresa et al. (2013), a constant duration of [x] frames was conventionally assigned to each occurrence to enable T-pattern detection on event data without a temporal parameter.]

The combined use of GSEQ and THEME followed the approach validated by Lapresa et al. (2013). Observational data were prepared using the OpenTag LSA Converter (Callaway, 2026; opentag.studio/tools/lsa-converter).

References

  • Bakeman, R., & Quera, V. (2011). Sequential analysis and observational methods for the behavioral sciences. Cambridge University Press. https://doi.org/10.1017/CBO9781139017343
  • Borrie, A., Jonsson, G. K., & Magnusson, M. S. (2002). Temporal pattern analysis and its applicability in sport. Journal of Sports Sciences, 20(10), 845–852. https://doi.org/10.1080/026404102320675675
  • Camerino, O., Chaverri, J., Anguera, M. T., & Jonsson, G. K. (2012). Dynamics of the game in soccer: Detection of T-patterns. European Journal of Sport Science, 12(3), 216–224. https://doi.org/10.1080/17461391.2011.566362
  • Diana, B., Zurloni, V., Elia, M., Cavalera, C. M., Jonsson, G. K., & Anguera, M. T. (2017). How game location affects soccer performance: T-pattern analysis of attack actions. Frontiers in Psychology, 8, 1415. https://doi.org/10.3389/fpsyg.2017.01415
  • Lapresa, D., Arana, J., Anguera, M. T., & Garzón, B. (2013). Comparative analysis of sequentiality using SDIS-GSEQ and THEME. Journal of Sports Sciences, 31(15), 1687–1695. https://doi.org/10.1080/02640414.2013.796061
  • Magnusson, M. S. (1996). Hidden real-time patterns in intra- and inter-individual behavior. European Journal of Psychological Assessment, 12(2), 112–123.
  • Magnusson, M. S. (2000). Discovering hidden time patterns in behavior: T-patterns and their detection. Behavior Research Methods, Instruments, & Computers, 32(1), 93–110. https://doi.org/10.3758/BF03200792