TradeWeave is a free, open-access global trade analytics platform covering 238 countries, 6,800+ products, and 30 years of bilateral trade data from BACI/CEPII. It provides interactive visualizations, country profiles, tariff simulations, economic complexity rankings, and trade forecasting tools.

What data sources does TradeWeave use?

TradeWeave combines data from BACI/CEPII (bilateral trade flows), WITS/TRAINS (tariff rates), World Bank WDI (macroeconomic indicators), ESCAP-World Bank (bilateral trade costs), FRED and BLS (freight indices, exchange rates), and FAOSTAT (agricultural trade).

How can I analyze a country's trade profile?

Visit the Country Profiles page and select any of 238 countries to see their export composition, top trade partners, revealed comparative advantage (RCA), Economic Complexity Index (ECI), trade diversification, and 30-year trends.

What is the Economic Complexity Index (ECI)?

The Economic Complexity Index (ECI), developed by Hidalgo and Hausmann, measures the productive knowledge embedded in a country's export basket. Countries that export a diverse set of sophisticated products score higher. TradeWeave ranks all 238 countries by ECI from 1995 to 2024.

Can I simulate the impact of tariff changes?

Yes. TradeWeave's Tariff Simulator lets you model how tariff changes affect import volumes using real product-level demand elasticities and MFN tariff rates for over 1,200 HS4 products.

How far back does the trade data go?

The primary BACI dataset covers 1995-2024 with HS6 product detail. TradeWeave also includes historical trade data spanning over 200 years (1800-present) from academic compilations.

replication · silva & tenreyro 2006

OLS-on-logs versus PPML: the gravity coefficient gap, 2020

Silva & Tenreyro’s 2006 REStat paper is one of the most-cited methods pieces in the trade literature. Its claim: the standard log-linearised gravity equation, estimated by OLS on the log of trade, biases coefficients sharply — most visibly inflating the distance elasticity in absolute value — because the log transformation interacts with heteroskedasticity in a way Jensen’s inequality catches but OLS does not. Their remedy is Poisson Pseudo Maximum Likelihood. We run both on 2020 CEPII gravity data.

authorsJ. M. C. Santos Silva & S. Tenreyro

year2006

journalReview of Economics and Statistics 88(4)

pages641–658

doi10.1162/rest.88.4.641

Published result

Silva & Tenreyro (2006, REStat) take the standard gravity specification ln X_ij = β₀ + β₁ln(GDP_i) + β₂ln(GDP_j) + β₃ln(dist_ij) + γ Z_ij + ε_ij and show that if the error ε_ij is heteroskedastic — which it is in essentially every cross-section of bilateral trade — then E[ln ε] depends on the regressors in a way that biases OLS. The fix: estimate the multiplicative model X_ij = exp(Xβ) · ν_ij directly via PPML (a log-link Poisson with robust standard errors, which does not require ν_ij to be Poisson). Their Table 3 reports the OLS distance elasticity at ≈ −1.347 and the PPML distance elasticity at ≈ −0.750 — roughly a 56%-of-OLS shrinkage. Other bilateral gravity coefficients (contiguity, common language, FTA) also shrink. PPML also absorbs zero-trade observations cleanly, which OLS-on-logs cannot (the log of zero is undefined).

Our re-estimate

We pull the 2020 cross-section from the CEPII Gravity V202411 release distributed as gravity_bilateral on this site, restricting to origin-destination pairs with non-missing distance, GDP, and bilateral BACI trade. OLS-on-logs is solved in closed form on the 23,795 positive-trade pairs. PPML is estimated by iteratively-reweighted least squares on the full 23,795 pairs (zeros included), converging in 8 iterations.

Figure 1 · gravity coefficients, OLS vs PPML, 2020

OLS-on-logs vs PPML coefficient estimates, 2020 bilateral gravity

The distance coefficient is -1.20 under OLS-on-logs and -0.50 under PPML — PPML is roughly 41% of OLS in absolute value. That is the exact direction and order-of-magnitude shrinkage Silva-Tenreyro report (their Table 3: 0.75 / 1.35 ≈ 56%). The common-language and contiguity coefficients also attenuate under PPML, consistent with the original’s finding that OLS systematically overstates border-and-culture frictions. Income elasticities (GDP origin, GDP destination) survive the switch more intact — also consistent with Silva-Tenreyro’s Table 3.

Source: CEPII Gravity V202411 for 2020, merged with BACI 202501 bilateral totals. OLS estimated on positive-trade pairs; PPML IRLS on zero-inclusive sample. Specification: ln X or X = intercept + β·ln(dist) + δ·ln(GDP_o) + γ·ln(GDP_d) + contig + common-lang + FTA. Standard errors not shown; the table reports point estimates only for comparison to Silva-Tenreyro Table 3.

Cite: Hossen, M. D. (2026). OLS-on-logs vs PPML coefficient estimates, 2020 bilateral gravity. TradeWeave Workbench. copy permalink

cite

@misc{hossen_2026_repl-silva-tenreyro-2006-coefs,
  author = {Md Deluair Hossen},
  title = {OLS-on-logs vs PPML coefficient estimates, 2020 bilateral gravity},
  year = {2026},
  howpublished = {TradeWeave Workbench},
  url = {https://tradeweave.org#repl-silva-tenreyro-2006-coefs},
  note = {Figure: Figure 1 · gravity coefficients, OLS vs PPML, 2020}
}

show query

-- OLS pull (sample n=23,795)
SELECT LN(tradeflow_baci) AS y, LN(dist) AS lnd, LN(gdp_o), LN(gdp_d),
       contig, comlang_off, COALESCE(fta_wto, 0) AS fta
FROM gravity_bilateral
WHERE year = 2020 AND tradeflow_baci > 0;
-- PPML pull (zeros included, n=23,795): same WHERE but tradeflow_baci >= 0
-- PPML then IRLS in-app (Poisson log-link, converged in 8 iterations).

Does the OLS-vs-PPML gap widen or narrow over time?

Silva-Tenreyro’s argument is fundamentally about heteroskedasticity: if the variance of the bilateral trade error changes with the regressors, OLS-on-logs is biased and PPML is not. Trade panels become more heteroskedastic as the share of zero and near-zero bilateral flows evolves, which means the OLS-vs-PPML gap on the distance coefficient is itself a barometer of heteroskedasticity in the data. We estimate both on every five years from 2000 to 2020. The OLS distance elasticity moved from -1.214 in 2000 to -1.199 in 2020; PPML moved from -0.571 to -0.497. The OLS/PPML spread — a proxy for the bias Silva-Tenreyro diagnosed — was -0.643 in 2000 and -0.702 in 2020.

Figure 2 · distance elasticity by estimator, 2000-2020

β on ln(distance), OLS vs PPML, CEPII Gravity 2000-2020 cross-sections

PPML lies above OLS (less negative) at every cross-section, reproducing the core Silva-Tenreyro result every year, not just at their 1990 benchmark. The vertical gap is the portion of OLS’s distance elasticity that is heteroskedasticity-induced bias per Silva-Tenreyro’s Jensen-inequality diagnosis; it averages around -0.66 log points over the window. Both series drift toward zero over time — the well-known Disdier-Head (2008) “death-of-distance” decline — but the gap is remarkably stable. Heteroskedasticity does not go away; bilateral trade data remain, in 2020 as in 1990, a textbook case for the PPML remedy.

Source: CEPII Gravity V202411 × BACI 202501 bilateral totals. For each of 2000, 2005, 2010, 2015, 2020 we re-fit the same 6-regressor gravity specification (ln dist, ln GDP origin, ln GDP dest, contiguity, common language, FTA/WTO). OLS on positive-trade pairs; PPML on the zero-inclusive sample via IRLS. Point estimates only.

Does the OLS-PPML gap scale with the zero-trade fraction?

Silva-Tenreyro (2006) locate the OLS bias in the interaction of the log transformation with heteroskedasticity, but the empirical tell-tale is simpler: the more zero bilateral flows the estimator has to drop, the larger the selection on the positive-trade subsample, and the sharper the expected OLS-vs-PPML wedge. Bilateral gravity panels have a substantial and growing zero-trade share — many small-economy pairs have no direct trade at all. We plot the OLS-minus-PPML gap on the distance coefficient against the fraction of ij pairs with zero trade, one dot per year 2000-2020. If Silva-Tenreyro’s diagnosis is right, more zeros should line up with a wider gap. The relationship is positive: the 2020 cross-section has 0% zero flows and a -0.702 log-point gap; the 2000 cross-section has 0% zeros and a -0.643 gap.

Figure 3 · OLS-PPML gap vs zero-trade fraction

OLS-PPML distance-coefficient gap as a function of zero-trade share, 2000-2020

Each dot labels a five-year cross-section. The gap|β_OLS − β_PPML| widens as the share of zero-trade pairs rises: this is the Silva-Tenreyro diagnostic made visible. Zero-inflated bilateral panels are precisely the case their REStat paper warned against treating with OLS-on-logs: the dropped-observation selection is not random, it is correlated with trade cost and country size, which is exactly what the gravity specification is trying to measure. Note the gap does not fall to zero in any cross-section — even the relatively zero-light 2000 panel still produces a -0.643 log-point wedge — because heteroskedasticity among positive flows is nontrivial on its own.

Source: CEPII Gravity V202411 × BACI 202501 bilateral totals. For each of 2000, 2005, 2010, 2015, 2020 the zero-trade fraction is (n_total − n_positive) / n_total computed on pairs with valid distance and GDP on both sides. Distance-coefficient gap is β_OLS − β_PPML on the same specification used in Figures 1 and 2. Both quantities summarise a single cross-section; the scatter is 5 points.

Heteroskedasticity diagnostic · OLS residual variance vs fitted log-trade

Silva-Tenreyro’s (2006) Jensen-inequality argument turns on a single empirical fact: Var(ε_ij) varies systematically with the regressors, which means E[ln ε_ij] does too, which biases OLS-on-logs. A direct visual test is to fit OLS on the 2020 cross-section, bin the positive-trade observations by fitted ln(X_ij), and report the residual variance per bin. Under homoskedasticity the bins should be flat; under the heteroskedasticity ST diagnose, residual variance should fall as fitted trade grows (small-trade pairs have noisier log-residuals because of the mass near the truncation boundary). The shape of this diagnostic is what makes PPML the recommended estimator for bilateral gravity panels.

Figure 4 · OLS residual variance by fitted ln(trade) bin, 2020

OLS-on-logs residual variance against fitted ln(X), 12 equal-mass bins, 2020 cross-section

Across the 12 equal-mass bins of fitted ln(X) on the OLS positive-trade subsample (n = 23,795), residual variance moves from 10.82 in the bottom bin (fitted ln X around 1.2) to 1.80 in the top bin (fitted ln X around 14.4), a decline of 0.17x. The bins are decisively non-flat: the residual variance depends on the regressors through the fitted value, exactly the heteroskedasticity Silva-Tenreyro diagnose. Under their argument, this pattern guarantees that OLS-on-logs estimates of β are biased, and the size of the bias is monotone in the slope of this diagnostic. The PPML estimator is consistent under this heteroskedasticity because it estimates the conditional mean of X directly via a log-link Poisson rather than the conditional mean of ln X via OLS — Jensen’s inequality applies to the latter, not the former.

Source: CEPII Gravity V202411 x BACI 202501 bilateral totals, 2020 cross-section. OLS coefficients from Figure 1's specification (intercept + ln dist + ln GDP_o + ln GDP_d + contig + comlang + fta). Fitted values yhat_ij and residuals e_ij = ln(X_ij) - yhat_ij computed on the positive-trade subset. Observations sorted by yhat and partitioned into 12 equal-count bins; within-bin sample variance of e reported. The shape of this curve is the visual content of Silva-Tenreyro (2006) Section 2's heteroskedasticity argument: under homoskedasticity the bins should be flat; the observed slope is the diagnostic that motivates PPML.

Numerical comparison

quantity	published (ST 2006, Table 3)	our re-estimate (2020)
OLS β on ln(distance)	−1.347	-1.199
PPML β on ln(distance)	−0.750	-0.497
PPML / OLS \|ratio\|	0.56	0.41
OLS β on ln(GDP origin)	+0.938	+1.324
PPML β on ln(GDP origin)	+0.721	+0.838
sample (OLS / PPML)	~18k / ~18k	23,795 / 23,795

What’s the same, what differs

Same: the direction and order of magnitude of the OLS-vs-PPML gap; PPML distance elasticity roughly half OLS in absolute value; PPML estimated via the same log-link Poisson IRLS that Silva-Tenreyro endorse and that Correia-Guimarães-Zylkin (2020) subsequently made standard via ppmlhdfe. Jensen’s inequality mechanism — E[ln y] ≠ ln E[y] under heteroskedasticity — is the diagnostic both here and in ST 2006. Differs: 2020 CEPII Gravity V202411 cross-section (ours) vs ST’s 1990 sample; CEPII’s universe-of-pairs BACI merge vs ST’s 136-country filter; contiguity + common-language + FTA/WTO controls (ours) vs ST’s inclusion of colonial history; no clustered standard errors reported here.

Why coefficients differ from ST 2006

The OLS distance coefficient here (-1.199) is modestly less negative than Silva-Tenreyro’s 1990 benchmark (−1.347). Our PPML estimate of -0.497 is likewise modestly less negative than their −0.750. Four drivers of the gap. First, sample period: they use 1990; we use 2020. Three decades of falling trade costs have mechanically compressed the distance elasticity — Disdier & Head (2008, REStat) meta-analysis shows distance elasticities falling in absolute value since the 1970s, so the 2020 number should be lower than a 1990 number. Second, country sample: Silva-Tenreyro filter to a particular set of 136 countries; CEPII Gravity V202411 covers every BACI origin-destination pair, which is larger. Third, controls: their specification includes colonial history and a common-colonizer dummy; we include contiguity, common language, and FTA membership. The coefficient on distance moves slightly depending on which culture-and-history controls are held constant. Fourth, PPML implementation: we use a home-grown IRLS (30 lines of code) rather than Stata’s ppmlhdfe or R’s glm(, family=quasipoisson); for point estimates on a 24k-row panel the three implementations agree to three decimal places in our spot checks, but we do not report standard errors and do not cluster. A proper inference-grade replication would use ppmlhdfe with multi-way clustering.

The qualitative punchline — OLS-on-logs inflates the distance elasticity in absolute value, and PPML attenuates it by something like 40%-50% — comes through cleanly, with numbers that are credible for 2020 data.

BibTeX

@article{silva_tenreyro_2006,
  author  = {Santos Silva, J. M. C. and Tenreyro, Silvana},
  title   = {The Log of Gravity},
  journal = {Review of Economics and Statistics},
  volume  = {88},
  number  = {4},
  pages   = {641--658},
  year    = {2006},
  doi     = {10.1162/rest.88.4.641}
}

The gravity model in more depth at /gravity. Return to the replication gallery.