Matrize News Communications · Political Research & Media Intelligence · Gauging the Mood of the Nation
Political Research · Public Opinion · Media Intelligence
Matrize News Communications
§ Methodology · Section B · The full apparatus
How the field becomes a defensible figure

How a figure
becomes defensible.

In this dossier ↙
  • The standard ··· p. B0
  • Sampling frame & allocation ··· p. B1
  • Instrument & interview ··· p. B2
  • Weighting & calibration ··· p. B3
  • Sentiment classifier ··· p. B4
  • Seat projection model ··· p. B5
  • Exit calibration & error log ··· p. B6
  • Replication pack ··· p. B7
§ B0 · The standard

Triangulate, calibrate,
disclose the margin.

Every wave is built from six instruments. We do not run a single instrument and call it a poll. We triangulate three independent measurements of the same quantity, calibrate against frames the reader can audit, and publish the margin a reader can use against us.

Principle · 01
Independence of measurements.

Probability sample, narrative stream, and exit operation use independent frames. Agreement across three is the test — not agreement within one.

Principle · 02
Margins are non-optional.

Every figure leaves the building with its sampling margin, field dates and a published 95 % interval. A point estimate without an interval is not a finding.

Principle · 03
Open error log.

Calls outside the published band are written up � after every count we publish � in the open error log. Section B6 of this dossier.

i
§ B1 · Instrument One

Stratified probability sampling.

Primary Sampling Units (PSUs) drawn with Probability Proportional to Size by recent turnout. Two-stage cluster design: PSU ? household ? respondent. The frame is the Election Commission booth list, not a metro-only panel. Available on request

The Matrize sampling frame is the Election Commission of India's booth list (the latest final electoral roll), reissued state-wise to our field offices. Strata are defined on the cross of region � locality (urban / rural / tribal) � turnout decile. PSUs are drawn within each cell with PPS by turnout, and small cells are pooled with the next adjacent cell to keep the design effect in check. Available on request

Households within a PSU are selected by the right-hand rule from a random-walk start. The respondent is selected by Kish grid. No respondent is replaced; non-contacts are revisited before substitution. The marginal Lok Sabha constituencies � historically the seats that decide a general election � are over-sampled and re-weighted back to their population share at the calibration stage.

ncell  =  Ncell · πcell  ·  fmarginal
where   πcell = Sigma turnout2024,booth / Sigma turnout2024,national
Stage 01
[n]
Strata cells (region � locality � turnout decile)
Stage 02
[n]
Primary Sampling Units drawn PPS
Stage 03
[n]
Households contacted, Kish grid
Stage 04
[n]
Completed interviews � response rate
Stage 05
[�]
Final margin at 95 % confidence
ii
§ B2 · Instrument Two

Native-language CAPI interview.

Computer-Assisted Personal Interviewing on handheld tablets, in respondents' own languages by interviewers from the same region. GPS-stamped, audio-validated, with a back-check baseline on every wave. Available on request

Interviewers are recruited and trained in the field office of the state they will operate in. They are paid a flat daily rate (not a per-interview rate) to remove the incentive to rush. Our in-house panel app collects responses with real-time GPS validation, instrument-level skip logic, and continuous upload to the central office on any reachable network. A submitted interview cannot be edited after upload.

A share of interviews is randomly selected for full-audio review, and a further share is subject to a back-check telephone call from a separate team. Interviewers whose back-check delta exceeds the threshold on the verifiable items (age, locality, household composition, voter ID) are removed from the roster for the wave and their work is re-collected. The questionnaire is published with each wave note. Available on request

Back-check delta  =  ‖ xfieldxbackcheck ‖  /  nverifiable items
Roster threshold  =  δ > 0.05
iii
§ B3 · Instrument Three

Weighting & calibration.

Raking on known frames � Census age � sex � locality, recent turnout, and vote recall. Weights are trimmed to control the design effect, and the likely-voter screen is a published battery. Available on request

Each completed interview is assigned an initial probability weight equal to the inverse of its inclusion probability under the sampling design. Those base weights are then iteratively raked to the frames listed below until each marginal converges to its target. The procedure is the classic Deming�Stephan algorithm. Weights are trimmed to prevent any single respondent from disproportionately influencing the estimate.

The likely-voter screen is a five-question battery, scored 0 — 10, capturing self-reported intent, past behaviour, registration, salience, and external corroboration. Respondents scoring < 6 are excluded from the headline figure but retained in the universe for non-voter analysis. The battery is identical across waves so a respondent's score is comparable over time.

wi(k+1)  =  wi(k)  ·  (Tj / Sj(k))
subject to   wi ≤ 4.0 · mean(w)   (trim)
  • Frame 01Age × sex × locality · 60 cellsCensus 2021
  • Frame 02Turnout decile by constituency · 543 cellsECI 2024 GE
  • Frame 03Vote recall, party × constituency · 543 × 7ECI 2024 GE
  • Frame 04Education × locality · 12 cellsPLFS 2023-24
  • Frame 05Caste-group share by state · state-levelSECC anchor · Lokniti 2024
iv
§ B4 · Instrument Four

Sentiment & narrative classifier.

Outlets ingested across Indian languages, with low end-to-end latency. A transparent classifier � published architecture, published training corpus, regular human audit � assigns sentiment, salience and narrative tag. Available on request

The stream is built on a fine-tuned multilingual transformer trained on a hand-labelled corpus. Labels are three-class sentiment (positive / neutral / negative), four-class salience (national / state / constituency / non-political), and a free-form narrative tag against a fixed taxonomy. The training corpus, label sheet and confusion matrix are published with the wave note.

Each week, a random sample of articles from the prior week's ingestion is re-labelled blind by an audit panel. The classifier's prediction is compared to the panel majority, disagreements are adjudicated, and the corpus is updated. When the audited F1 drops below the floor on consecutive cycles, the model is retrained. Available on request

Precision
[�]
Predicted-positive that is actually positive
Recall
[�]
Actually-positive that we predicted
F1 (mean)
[�]
Weekly blind audit
Latency � median
[�]
Ingestion ? tagged ? dashboard
Outlets live
[�]
Across Indian languages
Retrains � since launch
[�]
Triggered by audited F1 below floor
v
§ B5 · Instrument Five

Seat projection & model uncertainty.

A constituency-level swing model, monte-carlo simulated with full state-pair correlation. Reported as a central seat estimate plus a 90 % interval � never as a single number. Available on request

Each of the 543 Lok Sabha constituencies is modelled separately. The vote share for each party in each constituency is a function of: (i) the prior election's actual result; (ii) the state-level swing computed from the current wave; (iii) a constituency-specific deviation drawn from the historical distribution of deviations from state swing; (iv) a candidate-strength adjustment when an incumbent or a known high-profile entrant is on the ballot.

The model is then simulated ten thousand times. State-level swings are drawn jointly from a multivariate normal calibrated to the historical state-pair correlation matrix — Maharashtra and Madhya Pradesh swings are correlated, U.P. and Bihar are correlated, Kerala swings independently. This avoids the trap of treating state outcomes as independent and over-narrowing the seat interval. The 5th, 50th and 95th percentile across the ten thousand runs is reported as the 90 % interval.

Vp,c  =  Vp,cprior  +  σstate  +  εc  +  αcandidate
Joint distribution: σstate ~ MVN( μwave, Sigmahistorical )
240 272 median 320 352 90 % interval
Seat distribution � monte-carlo simulation Available on request
vi
§ B6 · Instrument Six

Exit calibration & open error log.

Every wave is back-tested against the prior election's known outcomes. Calls outside the published interval are written up in the open error log after the count. Available on request

Exit operations run with a separate field force from the panel � different interviewers, different PSUs, drawn on the morning of polling. The exit instrument is intentionally shorter to maintain response in the booth queue. Exit waves are not used to forecast; they are used to calibrate the panel model after the fact. The delta between exit and panel projection, by state, is the primary diagnostic in the post-mortem.

The open error log is published on this page after every count we publish. Calls that landed inside the 90 % interval are marked "in-band"; calls outside are marked "out-of-band" and accompanied by the named researcher's note on what the model missed. We do not retroactively widen the band to capture a miss.

§ B6 · The open error log

What we missed,
in our own words.

Calls outside the 90 % interval, accompanied by the partner's note. We do not retroactively widen the band; the published interval is what we go to print with, and an out-of-band call is an out-of-band call.

Wave Call Projected Actual Δ Status Partner note
2022 � UP Uttar Pradesh Assembly � with Republic TV Available on request Available on request Reported 100% accurate In partnership with Republic TV. Available on request
2023 — Karnataka Karnataka Assembly Available on request Available on request Accurate call Available on request
2023 — North-East Tripura � Meghalaya � Nagaland Available on request Available on request Accurate calls Available on request
Our record, in full � including the calls we missed Available on request Available on request
§ B7 · Replication

Replicate any figure
we publish.

Every wave ships with a replication pack � questionnaire, raw weights, model code and the run log. Available on request under a standard non-redistribution clause. Available on request

Replication is the test. A figure that cannot be reproduced from the same raw data by an independent analyst is not a finding. The Matrize replication pack contains every artefact a researcher needs to take Wave 14 from raw interview file to published seat interval and reproduce the result to the seat.

Packs are made available to commissioning clients on request, and to academic researchers under a standard non-redistribution clause that is shared with the request form. Available on request

The model code is reviewed by an external statistician, and the review note is published with the wave. Available on request

Replication pack � per wave

What ships with every wave.

  • 01Questionnaire master EN · HI · BN · TA · TE · MR · KN · ML · GU · OR · ASPDF · 3.2 MB
  • 02Raw weights, per-respondent CSV � per-respondent weights � frame columnsCSV
  • 03Calibration script R · rake() · 9-iteration logR · 14 KB
  • 04Seat-projection model Python · monte-carlo · 10,000 runs · seed pinnedPY · 38 KB
  • 05State-pair Sigma matrix 1999 — 2024 historical · 28 × 28CSV · 22 KB
  • 06Run log & convergence trace JSON · iter-by-iter delta vs toleranceJSON · 124 KB
  • 07External review note Available on requestPDF
§ B8 · Standards & oversight

The bodies we stand under.

External standards exist so a firm cannot grade its own homework. Available on request

Available on request
Research standard
Available on request
Available on request
Membership
Available on request
Available on request
Principles
Available on request
External review
Available on request
Independent review of the projection model, weighting and diagnostics. Available on request

Request a replication pack.
Or commission a wave.

Replication packs are made available to commissioning clients and academic researchers under a standard non-redistribution clause. Bespoke waves can be commissioned at the constituency, state or national level.

Request replication pack → Read about the firm
§ Coda · The standard

We do not predict
the news. We measure
what produces it.

Margin disclosure

Every figure published with its sampling margin, field dates and confidence interval. Without exception.

Method open

Our weighting, calibration and modelling notes are published with every wave. Replication on request.

Client agnostic

We publish what the field returns. Findings are not adjusted for the political colour of who commissioned them.