« The Mathematics of Life | Main | On Writing »

Mongo Boston 2011

Below are the notes that I took during the Oct 3, 2011 Mongo Boston Conference. If you notice any mistakes in my notes from what the excellent speakers said please let me know.

Welcome and What's New in MongoDB 2.0

Key objectives with MongoDB


  • JSON

  • General purpose

  • Scaling

  • Few knobs


What's new in 2.0


  • Journaling (on by default for 64-bit) (getLastError enhanced with flag to ACK it was written to journal)

  • Compact can now be run in parallel

  • Replica sets now support tags (useful for data center awareness in queries and replication)

  • Improved concurrency many locks are now yielded

  • Smaller faster indexes


Morphia: Easy Java Persistence

Morphia

Built on top of the 10Gen java driver.
Type-safe.
Uses JPA like annotations (@Entity, @Id, @Reference, @Serialized, @Transient, @Property, @Index, @Indexed, @NotSaved).
Speaker made heavy use of lombok library for automatic get/set and toString method handling.
Can handle ObjectId (Monog default) or custom Ids on objects.
Fluint like query interface.
Hooks for (@PrePersist, @PreSave, @PostPersist, @PreLoad, @PostLoad, @EntityListeners).
Does add className field to every saved object.
Supports loading a document into a different class and won't blow up on missing fields.
Can lazy load via Ref or Key (better to use).

Journal and Storage Engine

Separate file per DB, uses pre-allocation so there is always a spare file, largest file is 2GB.
Can get details of extends a collection is using through flags to validate.
Uses memory mapped files this means virtual size of MongoDB process is data size + overhead. With journaling it is overhead + 2 x data-size.
By default journaling fsyncs every 60seconds can tune with --syncDelay, doesn't nothing if there is nothing to flush.
If you have lots of inserts better to sync more frequently to reduce the chance that the OS force flushes.
db.serverStatus() includes fsync behavior
Future version of MongoDB should add support for data and indexes being stored in separate files (i.e. ability to put on different spindles/SSD)
Best to put journaling files on separate spindle.
Use ext4 for xfs for file system (zfs possible but not 100% recommended yet)
Journal on same spindle as data/index leads to 5-30% slowdown, while only 3% when on a separate spindle, this is very true when running on EC2/EBS
Things to watch out for:


  • transaction log wrapping due to large data transactions (can tune)

  • file fragmentation due to document size changes
    Changes to free list management coming to a future version of MongoDB which should help tremendously.
    Compact process is to run on secondaries and then step down the primary


Schema Design at Scale

Three keys things to consider:


  • embedding

  • indexes

  • sharding


Embedding (used example of a blog post with comments)


  • Embedded documents great for reads and reducing round-trips

  • Non-Embedded great for writes

  • Hybrid non-embedded with bucket structure using $push, $inc, and $pull

  • Hybrid model leads to variable size of items in buckets due to race conditions and deletes (not horrible to work around)

  • Can speed things up by having first bucket embedded in document

  • Use semantic key as id whenever possible

  • GUID for bucket objects since array ordering will vary


Indexing


  • Want right balanced tree insertion (using auto increment, or time based)

  • Covered indexes rock (given index for A,B, don't need index for A, queries that just return A+B only access index)

  • Always specify fields to return and must explicitly exclude _id if not needed

  • Explain is your friend, log bugs in JIRA if an index isn't being used


Sharding


  • Chunks try to be 64MB ro 100,000 Objects

  • Sharding key is hard to change but ranges can more easily change

  • Shard key options:

    • Auto incrementing: leads to hot spots

    • MD5(data): good r/w, bad index working set size

    • yearmonth + md5(data): leads to good distribution and good indexes




High-throughput data analysis

Worked on project that was processing on average 5GB/hour for 24 hours
Build using Lean startup approach: Idea -> Build, Code -> Measure, Data -> Learn

Listener


  • Written in C

  • Data sources stream updates to listener that buckets data into 1 second chunks

  • Used one socket per data type (12 in total)

  • Calculates delta between events before written into Mongo

  • Writes to different collection to indicate latest chunk written


Map Reduce


  • Using Mongo's Map Reduce

  • Map Reduce runs every second

  • Process each chunk in parallel


Replication and Replica Sets

Can configure a slave to get a delayed ops log (help against fat finger issues)
Manage through: rs.initiate, rs.add, rs.remove, rs.reconfigure, rs.stepDown
Op log application is idempotent
Currently limited to 12 members of a replica set (of which 7 exchange heartbeats and 5 non-voting members)
Heartbeats fire between all replica set members apx. every 200ms which means .5 - 2 sec lag if new primary is needed
Priority determines chance to be elected, however secondary with newest op log has precedence
2.0 adds tags which can be used with getLastError() to ensure things like cross datacenter replication (given tags like dc.ny, dc.sf)
Tag rules can be fairly complex
Smarter geographically distributed replication coming in the future

Optimizing MongoDB: Lessons Learned at Localytics

Tips:


  • Use short names for fields (recommend single letter)

  • Store GUIDS/long IDs with BinData()

  • Overrride _id with semantic key

  • Pre-aggregate with $inc

  • Prefix indexes with suffix buckets

  • Sparse indexes define with sparse: true with create

  • Keep index working set in RAM


Indexes:


  • Use indexes

  • Only query for what you want, use limit() and findOne()

  • Specify fields to return

  • Covering indexes are awesome

  • Exclude _id if you don't need it or is not part of index

  • To avoid lock wait time (not as true in 2.0) do a fetch then update


Fragmentation:


  • Setup inserts so they are right balanced into indexes

  • Manual padding if you know the document will expand in the future (BinData)

  • Beware that compact will remove padding


Performance:


  • Hash leads to all B-Tree access

  • ObjectId has temporal prefix

  • Can turn off automatic balancer and do manual chunk assignment

  • In EC2 running on an m1 or m2 will most likely give you an isolated box

  • Using m2 for high memory mongods

  • Using micro for arbiters and configs

  • Using EBS and RAID0 for writes and RAID10 for reads

  • Use at least 4-8 EBS volumes, 8-16 recommended

  • Determined by testing with a 10:1 read:write ratio with a working set size larger than RAM

  • Need shard key in every key


Text Analysis Using MongoDB

Creating features from unstructured data and storing features in Mongo
Running Hadoop process on same machine as MongoDB
Map/Reduce written in MongoReduce
Using Thrift
Have low-latency process running on same machine as MongoDB
Use Mongo configuration to determine what servers to talk to (single source of truth)