Imperfect data: Most studies start with imperfect data. Few datasets involve the entire population of interest.
Typically, the data has been gathered by others for specific purposes, and as such may have built in biases or representational problems. As a consumer of analytical research, you should be looking for whether the authors properly describe the source of their data and any connected limitations imposed by that source. Surveys of populations will frequently report their confidence intervals. At either the national level or at the economy wide or sectoral level of analysis, data often has relatively small confidence intervals across space and over time.
As the data is subdivided to represent subsets of the source population (e.g., the Labour Force Survey unemployment rate in manufacturing in Saskatchewan vs. the unemployment rate for Canada as a whole), the confidence intervals will widen significantly. The level of confidence may widen to the point where differences of ± 10% to 20% may not be statistically significant. Authors should carefully consider the provenance and reliability of their data.
A second problem is that quite often authors report that they have “cleaned” a dataset – e.g., dropped outliers in panel data or lopped off tips or tails of longitudinal data. Any time you hear this, your antennae should go up. Cleaning data should be done very carefully and any changes in data should be fully discussed and analyzed, rather than simply accepted.