Methodology
Last updated: N/A • Data source: World Bank (WDI & related)
ReproducibleGroundedLLM-safe
1) Data ingestion & scope
- API: World Bank v2 JSON endpoints (
/sources,/indicators,/country,/country/all/indicator/{code}). - Coverage: All indicators, countries & aggregates; years 1960–2025 where available.
- Storage: MongoDB (
worldbank_raw), 1 row per indicator × country × year; key{indicator|country|year}. - Versioning: Every dump stamped with
dump_as_of(UTC); derived tables reference the same snapshot.
2) Data quality & governance
- Metadata: We keep id, name, source_id, unit, notes/definitions per indicator.
- Aggregates: World/regions/income groups are tagged and excluded from country-only rankings by default.
- Missing values: Shown as
N/A. Short gaps (≤2 years) may be linearly interpolated and flagged asfilled. - Outliers: Winsorized per indicator-year across countries (default P5–P95).
- Units: We never mix unit families (e.g., current USD vs constant USD vs PPP).
3) Transformations & features
- Per-capita:
x_pc = x / POP; % of GDP:x_%GDP = 100 * x / GDP. - Log transform (levels):
y = ln(x + ε). Logit (bounded %):z = ln(p/(1−p)), withp = x/100. - Growth: YoY%
100*(xt/xt−1 − 1); log-growthln(xt+ε) − ln(xt−1+ε); CAGR(x1/x0)^(1/n) − 1. - Rolling: mean/std (e.g., 3y) and OLS slope on last 5y.
- Polarity: indicators marked higher- or lower-is-better (e.g., unemployment, CO₂ are lower-better).
4) Normalization & ranks
- Percentile (world): empirical percentile per year across countries (after winsorization), scaled 0–100.
- Robust z-score:
(x − median) / (1.4826 * MAD). - Polarity handling: for lower-better we invert percentile (
100 − s). Raw values are never inverted. - Optional scopes: Region and Income-group percentiles for benchmarking.
5) Headline KPIs (final)
| Code | Label | Unit | Polarity | YoY | Notes |
|---|---|---|---|---|---|
| NY.GDP.PCAP.KD | GDP per capita (constant USD) | USD/person | Higher ↑ | % | Real, chained USD |
| FP.CPI.TOTL.ZG | Inflation, consumer prices | % (annual) | Lower ↓ | Δ pp | Rate; clip extremes in charts |
| NE.EXP.GNFS.ZS | Exports of goods & services | % of GDP | Higher ↑ | Δ pp | Openness proxy |
| SL.UEM.TOTL.ZS | Unemployment, total | % of labor force | Lower ↓ | Δ pp | Youth unemployment tracked in personas |
| SL.TLF.CACT.ZS | Labor force participation | % of 15+ | Higher ↑ | Δ pp | |
| SP.POP.TOTL | Population | persons | — | % | Level; show YoY % |
| SP.DYN.LE00.IN | Life expectancy at birth | years | Higher ↑ | Δ years | |
| SE.TER.ENRR | Tertiary enrollment (gross) | % | Higher ↑ | Δ pp | |
| SE.ADT.LITR.ZS | Adult literacy (15+) | % | Higher ↑ | Δ pp | If missing, show latest only |
| IT.NET.USER.ZS | Individuals using the Internet | % of population | Higher ↑ | Δ pp | |
| IT.CEL.SETS.P2 | Mobile cellular subscriptions | per 100 people | Higher ↑ | Δ | Connectivity proxy |
| EN.ATM.CO2E.PC | CO₂ emissions | t/person | Lower ↓ | % | Environmental pressure proxy |
6) Persona indices
Composite scores (0–100) built from per-indicator percentiles (polarity applied). Equal weights across pillars and within pillars. Score published if ≥60% indicators are present; otherwise flagged low_coverage.
Job Seeker
Employment health
SL.UEM.TOTL.ZS— Unemployment, total (Lower ↓)SL.UEM.1524.ZS— Youth unemployment (optional) (Lower ↓)
Participation & skills
SL.TLF.CACT.ZS— Labor force participation (Higher ↑)SE.TER.ENRR— Tertiary enrollment (Higher ↑)
Momentum
NY.GDP.MKTP.KD.ZG— Real GDP growth (Higher ↑)NE.EXP.GNFS.ZS— Exports % GDP (Higher ↑)
Digital access
IT.NET.USER.ZS— Internet users % (Higher ↑)IT.CEL.SETS.P2— Mobile subs per 100 (Higher ↑)
Entrepreneur
Regulatory & legal
IC.LGL.CRED.XQ— Strength of legal rights (Higher ↑)IC.BUS.NDNS.ZS— New business density (Higher ↑)
Access to finance
FS.AST.PRVT.GD.ZS— Credit to private sector % GDP (Higher ↑)FB.AST.NPER.ZS— NPLs % of total (if present) (Lower ↓)
Infrastructure & power
EG.ELC.ACCS.ZS— Access to electricity (Higher ↑)EG.ELC.RNEW.ZS— Renewable electricity output (Higher ↑)
Innovation & high-tech trade
TX.VAL.TECH.MF.ZS— High-tech exports share (Higher ↑)IP.JRN.ARTC.SC— Sci/tech journal articles (Higher ↑ (log before percentile))
Digital Nomad
Connectivity
IT.NET.USER.ZS— Internet users % (Higher ↑)IT.NET.BBND.P2— Fixed broadband per 100 (if present) (Higher ↑)IT.CEL.SETS.P2— Mobile subs per 100 (Higher ↑)
Affordability & stability
PA.NUS.PPPC.RF— Price level ratio (Lower ↓)FP.CPI.TOTL.ZG— Inflation % (Lower ↓)
Livability & safety
SP.DYN.LE00.IN— Life expectancy (Higher ↑)EN.ATM.PM25.MC.M3— PM2.5 exposure (Lower ↓)SH.STA.HOMIC.ZS— Homicide rate (if present) (Lower ↓)
Expat Family
Health
SP.DYN.LE00.IN— Life expectancy (Higher ↑)SH.XPD.CHEX.PC.CD— Health expend. per capita (Higher ↑ (log before percentile))SH.IMM.MEAS.ZS— Measles immunization (Higher ↑)
Education
SE.SEC.ENRR— Secondary enrollment (Higher ↑)SE.TER.ENRR— Tertiary enrollment (Higher ↑)SE.ADT.LITR.ZS— Adult literacy (Higher ↑)
Safety & environment
SH.STA.HOMIC.ZS— Homicide rate (Lower ↓)EN.ATM.PM25.MC.M3— PM2.5 exposure (Lower ↓)EN.ATM.CO2E.PC— CO₂ per capita (Lower ↓)
7) Country profiles & comparisons
- Profiles: latest value + year + unit + YoY + world percentile; 10–20y trends; benchmarks (World/Region/Income).
- Comparisons: side-by-side KPIs, percentile bars, trend overlays; ranks exclude aggregates by default.
8) Opportunity mapping
- Level percentile L ∈ [0,100] and Trend percentile T ∈ [0,100] (YoY or 5y CAGR).
- Score: geometric mean
O = √(L · T)with a small volatility penalty; coverage ≥70%, latest ≤2y, inflation within bounds.
9) Forecasts (projections)
- Scope: smooth, well-covered annual series (GDP pc, inflation, unemployment, internet users, life expectancy, CO₂ pc).
- Transforms: log for levels; logit for bounded %.
- Models: RWD, ARIMA(0,1,1) with drift, or ETS; chosen via rolling-origin CV (sMAPE/MASE).
- Uncertainty: 80/95% intervals; forecasts shown as dashed and labeled “Projection” with as_of_year.
10) Alerts
- Triggers: threshold/percentile crossings, large YoY/log-growth, trend breaks, anomalies (robust z), forecast breaches.
- Noise control: hysteresis (2 consecutive observations), cooldown windows, recency & coverage checks.
- Audit: each alert stores indicator code, year, value, percentile, and the rule that fired.
11) LLM grounding
- Evidence bundle: compact JSON of numbers/years/units/sources from Mongo; the model only narrates from this evidence.
- Discipline: every numeric claim includes year + unit + source code (e.g.,
NY.GDP.PCAP.KD, 2023, WDI); no on-the-fly calculations.
12) Reproducibility & versioning
- Each release references a single
dump_as_ofand pipeline commit. - Derived tables (
features,country_views,persona_scores,opportunities,forecasts,alerts) are rebuilt end-to-end. - Any page/report can be reproduced by (dump_as_of, country, year).
13) Limitations & ethics
- Some indicators have gaps or long lags; we surface the latest year explicitly.
- Method changes may affect comparability across time; versioning exposes changes.
- Aggregates (e.g., WLD/EUU) are benchmarks; rankings default to countries only.
- Forecasts are scenarios with uncertainty; they should complement expert judgement.
14) Contact
Questions or feedback? Email support@sufoniq.com.