As part of migrating Applab to use MySQL instead of Firebase, we have to migrate existing data. We'll be migrating a terabyte of firebase JSON to MySQL (see: code-dot-org/code-dot-org#55084). Initial versions of this tool were written in Ruby (and Javascript), but performance was so slow it would have taken many days to migrate data, possibly more than a week. The current version of the tool is written in C++ using the RapidJSON (https://rapidjson.org/) library, conveniently the same JSON parser as MySQL uses internally.
The tool:
- Does a streaming "SAX-like" parse because all the JSON will not fit into memory.
- Supports uploading data as a row-per-student-record, row-per-student-table or row-per-student-project format.
- Detects stock datasets and optionally deplicates them.
- Uses a configurable number of background threads for uploading data to MySQL, unblocking the main thread for JSON parsing (this is the bottleneck currently).
- Validates JSON record rows before inserting them, drops invalid rows.
Need to edit the mysql-connector-c++ formula first (UGH! they don't include the jdbc bits required for an AWS RDS instance which doesn't support x plugin, used by the newer connector APIs 😥, edits are based off parallel in freebsd ports: https://cgit.freebsd.org/ports/commit/?id=adcb80f3fa92f9f25c3aa84fc4b1e1e79919acc0).
See: https://docs.brew.sh/FAQ#can-i-edit-formulae-myself
- export HOMEBREW_NO_INSTALL_FROM_API=1
brew edit mysql-connector-c++
- Find this line:
system "cmake", "-S", ".", "-B", "build", "-DINSTALL_LIB_DIR=lib", *std_cmake_args
- Change it to:
system "cmake", "-S", ".", "-B", "build", "-DINSTALL_LIB_DIR=lib", "-DWITH_JDBC=ON", *std_cmake_args
brew reinstall --build-from-source mysql-connector-c++
- Verify the jdbc.h header is now installed:
ls /opt/homebrew/include/mysql/jdbc.h
- For docs on using the legacy jdbc.h API see: https://dev.mysql.com/doc/dev/connector-cpp/latest/jdbc_ref.html
brew install rapidjson boost
apt-get install clang rapidjson-dev libboost-iostreams-dev build-essential libboost-system-dev libboost-thread-dev
git clone https://github.com/mysql/mysql-connector-cpp.git
- cd mysql-connector-cpp
cmake -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX:PATH=/usr/local -DWITH_JDBC=ON .
cmake --build .
sudo cmake --build . --target install
- This is super annoying but I haven't figured out how to fix it with proper build args to the cmake, and this is limited use so here's the hack:
sudo mv /usr/local/lib64/libmysqlcppconn* /usr/local/lib
sudo ldconfig
You can use the get-latest-firebase-data.sh
script to fetch the latest backup from firebase.
- make
- ./firebase-migration-tool lil-prod.json