Mongo Sharding and GridFS

Data Engineering

April 30, 2015. read.

Mongo Sharding and GridFS

A couple of days ago I gave a talk in my data engineering class about Mongo and how I was using it in combination with GridFS for my work. If you want to dive into the slides here they are: Mongo Presentation. To navigate through the presentation, use the arrow keys, or simply press the spacebar to go to the next page.

I have been using Mongo Sharding in conjunction with GridFS to create a new database for all the APKs and Files our team uses to scan for malware. We need to transfer all these files to this new db, because the old way of storage is a huge RAID of hard disks and itis starting to fail more and more often. This new database consists of seven 10 TB shards combining to a total of 70TB.

What I covered in the presentation:

  • What is sharding?
    • Why do we need sharding?
    • How sharding works in terms of MongoDB
  • Mongo Sharding Tutorial
    • Config Server
    • Mongos - Routing server
    • Creating DBs and Shards
    • How to shard on DB/Collection level
  • GridFS
    • How GridFS stores Files
    • GridFS Implementation
    • Example: C# Driver
  • Resources

What you will be building

Note: I used reactjs to create the presentation. Here is the github source if you want to see how I created it.