Facebook is frequently in the press, sometimes with data leaks and last October a possible $1.6 billion fine for violating GDPR compliance after a recent data breach. From an IT Engineering perspective, they have done some amazing things, which I’ll cover, but which end of the scale of benefit vs. harm does Facebook fall when all is considered?
Jeffrey Hammerbacher was an early Facebook employee while at Harvard. While he left in early 2008, he left us with a very important quote. "The best minds of my generation are thinking about how to make people click ads," he told Businessweek, "That sucks." It's not just Facebook, but Google is in the same boat. When I think about this, Jurassic Park comes to mind when the T-Rex is running around eating people and Jeff Goldblum's character says "Yeah, but your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should." Perhaps that Facebook has a market value over $500B answers part of the question.
So here we are, 2019, 1.7 billon devices use Facebook, and people voluntarily give their personal information to them. What they like, pictures, friends, political views and more. People believe the content enough that the Russians smartly used the platform to manipulate the masses for their own political will. In my blog, the Security Considerations of Social Media, I have warned others before about how your Facebook information can be (and is) used by hackers to compromise your financial information, and I could continue to write on this subject, but let's go a different direction.
If you listen to Mark Zuckerberg, he will talk about creating an open and connected society, but that is more about their aggressive growth strategy. Let's forget the Facebook marketing talk and turn to technology in general.
Think about what Facebook has had to do to create an interactive user platform that will allow these 1.7 billion devices to connect and interact real-time 24x7. The network, database and software architecture needed to do this at scale is a problem that very few have solved. If you think your office server is slow, imagine how it would respond if that many people tried to use it. It would melt. There are many significant algorithmic decisions they use to ensure that your data, feeds and user experience are optimized, as well as the massive investment in infrastructure to make this happen. If you read about and watch videos on the Facebook Developer conference, you will see that they have had to continually change their assumptions as their user base grew. They consistently pushed it to the edge and then had to engineer some major changes to be able get beyond their current limitations. Facebook has done a good job of sharing this information to the public.
When Facebook started it used MySQL they used a very clever architecture of master-slave databases, where the application would read from a load balanced number of slaves and write to a different set of nodes that would replicate to the slaves. At some point, they could not scale this, so had to move to NoSQL. They wrote (and shared) Apache Cassandra . They wrote this to solve a large scaling problem and shared it to the open source community for it to be available for anyone to use.
It is not really called LAMP++, but they took LAMP to the next level. Facebook was developed using the common LAMP stack (Linux, Apache, MySQL, and PHP) with Memcache. We already spoke about their changes to the databases component, but at their scale and growth, they needed more out of PHP, so they wrote a PHP compiler called HHVM, which is a just-in-time compiler for Hack and PHP that allows the flexibility of PHP but with much higher performance. They also released this as open source for the world to use.
The last technical contribution that deserves some attention is Facebook’s contribution to the Open Computing Project initiative. Imagine building a compute and software architecture that may need 10’s of thousands if not 100’s of thousands of compute nodes at a scale that nobody has seen. It is likely that the power, cooling and performance of the standard server manufacturers is not going to meet your needs. Facebook, then invested their time and effort to define what they wanted and again shared it with the world. If you dig into the specs of the design it is, in my opinion, a reference for all the important elements of what makes a compute platform reliable and scalable.
If you want to know more about what is available from Facebook, there are many more topics that I have not even approached, but you can check it out at Facebook’s Open Source page.
While all of their contributions have been very valuable, one could argue it still does not make up for their business model in which they endlessly mine data from people and exploite it for their profit. However, if you want to make billions of dollars, it takes very technical people to make sure it happens quickly and reliably. Maybe you can benefit from some of the technical innovations they’ve shared.
If you would like to learn more, contact me.