Sufoniq LogoSufoniq

Methodology

Last updated: N/A • Data source: World Bank (WDI & related)

ReproducibleGroundedLLM-safe

1) Data ingestion & scope

  • API: World Bank v2 JSON endpoints (/sources, /indicators, /country, /country/all/indicator/{code}).
  • Coverage: All indicators, countries & aggregates; years 1960–2025 where available.
  • Storage: MongoDB (worldbank_raw), 1 row per indicator × country × year; key {indicator|country|year}.
  • Versioning: Every dump stamped with dump_as_of (UTC); derived tables reference the same snapshot.

2) Data quality & governance

  • Metadata: We keep id, name, source_id, unit, notes/definitions per indicator.
  • Aggregates: World/regions/income groups are tagged and excluded from country-only rankings by default.
  • Missing values: Shown as N/A. Short gaps (≤2 years) may be linearly interpolated and flagged as filled.
  • Outliers: Winsorized per indicator-year across countries (default P5–P95).
  • Units: We never mix unit families (e.g., current USD vs constant USD vs PPP).

3) Transformations & features

  • Per-capita: x_pc = x / POP; % of GDP: x_%GDP = 100 * x / GDP.
  • Log transform (levels): y = ln(x + ε). Logit (bounded %): z = ln(p/(1−p)), with p = x/100.
  • Growth: YoY% 100*(xt/xt−1 − 1); log-growth ln(xt+ε) − ln(xt−1+ε); CAGR (x1/x0)^(1/n) − 1.
  • Rolling: mean/std (e.g., 3y) and OLS slope on last 5y.
  • Polarity: indicators marked higher- or lower-is-better (e.g., unemployment, CO₂ are lower-better).

4) Normalization & ranks

  • Percentile (world): empirical percentile per year across countries (after winsorization), scaled 0–100.
  • Robust z-score: (x − median) / (1.4826 * MAD).
  • Polarity handling: for lower-better we invert percentile (100 − s). Raw values are never inverted.
  • Optional scopes: Region and Income-group percentiles for benchmarking.

5) Headline KPIs (final)

CodeLabelUnitPolarityYoYNotes
NY.GDP.PCAP.KDGDP per capita (constant USD)USD/personHigher ↑%Real, chained USD
FP.CPI.TOTL.ZGInflation, consumer prices% (annual)Lower ↓Δ ppRate; clip extremes in charts
NE.EXP.GNFS.ZSExports of goods & services% of GDPHigher ↑Δ ppOpenness proxy
SL.UEM.TOTL.ZSUnemployment, total% of labor forceLower ↓Δ ppYouth unemployment tracked in personas
SL.TLF.CACT.ZSLabor force participation% of 15+Higher ↑Δ pp
SP.POP.TOTLPopulationpersons%Level; show YoY %
SP.DYN.LE00.INLife expectancy at birthyearsHigher ↑Δ years
SE.TER.ENRRTertiary enrollment (gross)%Higher ↑Δ pp
SE.ADT.LITR.ZSAdult literacy (15+)%Higher ↑Δ ppIf missing, show latest only
IT.NET.USER.ZSIndividuals using the Internet% of populationHigher ↑Δ pp
IT.CEL.SETS.P2Mobile cellular subscriptionsper 100 peopleHigher ↑ΔConnectivity proxy
EN.ATM.CO2E.PCCO₂ emissionst/personLower ↓%Environmental pressure proxy

6) Persona indices

Composite scores (0–100) built from per-indicator percentiles (polarity applied). Equal weights across pillars and within pillars. Score published if ≥60% indicators are present; otherwise flagged low_coverage.

Job Seeker

Employment health

  • SL.UEM.TOTL.ZSUnemployment, total (Lower ↓)
  • SL.UEM.1524.ZSYouth unemployment (optional) (Lower ↓)

Participation & skills

  • SL.TLF.CACT.ZSLabor force participation (Higher ↑)
  • SE.TER.ENRRTertiary enrollment (Higher ↑)

Momentum

  • NY.GDP.MKTP.KD.ZGReal GDP growth (Higher ↑)
  • NE.EXP.GNFS.ZSExports % GDP (Higher ↑)

Digital access

  • IT.NET.USER.ZSInternet users % (Higher ↑)
  • IT.CEL.SETS.P2Mobile subs per 100 (Higher ↑)

Entrepreneur

Regulatory & legal

  • IC.LGL.CRED.XQStrength of legal rights (Higher ↑)
  • IC.BUS.NDNS.ZSNew business density (Higher ↑)

Access to finance

  • FS.AST.PRVT.GD.ZSCredit to private sector % GDP (Higher ↑)
  • FB.AST.NPER.ZSNPLs % of total (if present) (Lower ↓)

Infrastructure & power

  • EG.ELC.ACCS.ZSAccess to electricity (Higher ↑)
  • EG.ELC.RNEW.ZSRenewable electricity output (Higher ↑)

Innovation & high-tech trade

  • TX.VAL.TECH.MF.ZSHigh-tech exports share (Higher ↑)
  • IP.JRN.ARTC.SCSci/tech journal articles (Higher ↑ (log before percentile))

Digital Nomad

Connectivity

  • IT.NET.USER.ZSInternet users % (Higher ↑)
  • IT.NET.BBND.P2Fixed broadband per 100 (if present) (Higher ↑)
  • IT.CEL.SETS.P2Mobile subs per 100 (Higher ↑)

Affordability & stability

  • PA.NUS.PPPC.RFPrice level ratio (Lower ↓)
  • FP.CPI.TOTL.ZGInflation % (Lower ↓)

Livability & safety

  • SP.DYN.LE00.INLife expectancy (Higher ↑)
  • EN.ATM.PM25.MC.M3PM2.5 exposure (Lower ↓)
  • SH.STA.HOMIC.ZSHomicide rate (if present) (Lower ↓)

Expat Family

Health

  • SP.DYN.LE00.INLife expectancy (Higher ↑)
  • SH.XPD.CHEX.PC.CDHealth expend. per capita (Higher ↑ (log before percentile))
  • SH.IMM.MEAS.ZSMeasles immunization (Higher ↑)

Education

  • SE.SEC.ENRRSecondary enrollment (Higher ↑)
  • SE.TER.ENRRTertiary enrollment (Higher ↑)
  • SE.ADT.LITR.ZSAdult literacy (Higher ↑)

Safety & environment

  • SH.STA.HOMIC.ZSHomicide rate (Lower ↓)
  • EN.ATM.PM25.MC.M3PM2.5 exposure (Lower ↓)
  • EN.ATM.CO2E.PCCO₂ per capita (Lower ↓)

7) Country profiles & comparisons

  • Profiles: latest value + year + unit + YoY + world percentile; 10–20y trends; benchmarks (World/Region/Income).
  • Comparisons: side-by-side KPIs, percentile bars, trend overlays; ranks exclude aggregates by default.

8) Opportunity mapping

  • Level percentile L ∈ [0,100] and Trend percentile T ∈ [0,100] (YoY or 5y CAGR).
  • Score: geometric mean O = √(L · T) with a small volatility penalty; coverage ≥70%, latest ≤2y, inflation within bounds.

9) Forecasts (projections)

  • Scope: smooth, well-covered annual series (GDP pc, inflation, unemployment, internet users, life expectancy, CO₂ pc).
  • Transforms: log for levels; logit for bounded %.
  • Models: RWD, ARIMA(0,1,1) with drift, or ETS; chosen via rolling-origin CV (sMAPE/MASE).
  • Uncertainty: 80/95% intervals; forecasts shown as dashed and labeled “Projection” with as_of_year.

10) Alerts

  • Triggers: threshold/percentile crossings, large YoY/log-growth, trend breaks, anomalies (robust z), forecast breaches.
  • Noise control: hysteresis (2 consecutive observations), cooldown windows, recency & coverage checks.
  • Audit: each alert stores indicator code, year, value, percentile, and the rule that fired.

11) LLM grounding

  • Evidence bundle: compact JSON of numbers/years/units/sources from Mongo; the model only narrates from this evidence.
  • Discipline: every numeric claim includes year + unit + source code (e.g., NY.GDP.PCAP.KD, 2023, WDI); no on-the-fly calculations.

12) Reproducibility & versioning

  • Each release references a single dump_as_of and pipeline commit.
  • Derived tables (features, country_views, persona_scores, opportunities, forecasts, alerts) are rebuilt end-to-end.
  • Any page/report can be reproduced by (dump_as_of, country, year).

13) Limitations & ethics

  • Some indicators have gaps or long lags; we surface the latest year explicitly.
  • Method changes may affect comparability across time; versioning exposes changes.
  • Aggregates (e.g., WLD/EUU) are benchmarks; rankings default to countries only.
  • Forecasts are scenarios with uncertainty; they should complement expert judgement.

14) Contact

Questions or feedback? Email support@sufoniq.com.