Did you ever wonder why in hadoop three is the default replication factor?
I heard this question, but without answer, on an episode of Software Engineering Daily (I think the one with Sumo Logic). I was curious, found this article – https://www.mapr.com/blog/revealed-why-hadoop-uses-three-replicas. The article is short, but if you in a hurry etc., just remember this:
1 replica in a cluster gives <1 year MTDL, 2 gives about 10 years or so, and 3 replicas in a cluster provides >100 years MTDL
MTDL – mean time between data loss
So with less than 3 replicas there is actually quite high probability of loosing data. This is a whole mystery 🙂