Why 3?

Did you ever wonder why in hadoop three is the default replication factor?

I heard this question, but without answer, on an episode of Software Engineering Daily (I think the one with Sumo Logic). I was curious, found this article – https://www.mapr.com/blog/revealed-why-hadoop-uses-three-replicas. The article is short, but if you in a hurry etc., just remember this:

1 replica in a cluster gives <1 year MTDL, 2 gives about 10 years or so, and 3 replicas in a cluster provides >100 years MTDL

MTDL –  mean time between data loss

So with less than 3 replicas there is actually quite high probability of loosing data. This is a whole mystery 🙂

Leave a Reply

Your email address will not be published.