Ingesting data into MagicBox¶
Importing new data sets to MagicBox is important for expanding coverage and how we retrieve new data insights. This document explains how we ingest and import new data to MagicBox.
School data ingestion¶
Data for schools comes in CSV files with a specific naming scheme. We developed a Ruby on Rails-based CRUD  application, Project Connect to ingest this data.
There are two admin interfaces for Rails applications:
- Rails Admin
- Active Admin
Active Admin has an import plugin for uploading CSVs with
We chose Active Admin for this reason.
School data database¶
MagicBox needed a database to manage large amounts of school data and process it quickly. To do this, MagicBox uses a relational database, PostgreSQL, to store data. PostgreSQL was chosen because it was used in a tutorial for building a similar use case as ours (thus, a relational database is not a “married” idea).
All school data is stored in a single table named
This keeps things simple at the expense of some repeated values in the database.
When data is imported, new data is inserted alongside the imported data. Four new types of important data are added:
- Provider / owner of school CSV data
- Organization that received school data
- Identity of uploader
- If uploader chooses to remain private
These values are not included with the data.
We use the
before_batch_import hook in Active Admin to parse file names, insert this data, and assign values.
|||Create, Read, Update, Delete|