Data validation¶
Data validation ensures correct and consistent data is processed and aggregated for use in MagicBox. The validated data is used by implementations of the MagicBox API, such as magicbox-maps.
This article explains how we verify values, identify duplicates, and merge multiple records into one.
How we do it¶

How magicbox-latlong-admin-server validates coordinate pairs
Data validation is performed by magicbox-latlong-admin-server. The server verifies a pair of latitude / longtitude coordinates are located within the country indicated by the beginning of a file name.
The data validation program processes every ten minutes as a cron job. It runs a programatic version of this query:
// Do a select on all schools that have not been geo validated
"SELECT id, lat, lon, country_code FROM schools WHERE date_geo_validated IS null AND lat IS NOT null AND lon IS NOT null"

Adding new attributes¶
After the data is validated, the magicbox-latlong-admin-server updates attributes for some data. The attributes are only added if the engine returns an admin area ID for the target country (see Administrative boundaries).
date_geo_validated
: ToCURRENT_TIMESTAMP
coords_within_country
: true or false
Why we do it¶
Some CSV files we import and aggregate contain tens of thousands of records.
Data validation is an expensive operation.
We chose not to verify each school’s coordinates in the before_batch_import
hook for this reason.
Instead, we opted for the data validation program to run as a cron job every ten minutes to accomplish this.