You can edit almost every page by Creating an account. Otherwise, see the FAQ.

Facebook data centers

From EverybodyWiki Bios & Wiki



Facebook is the World’s third most popular website after Google and YouTube, as well as the company which owns it.[1] The website was born in 2004 and soon started to collect a massive amount of valuable data. During the first years, Facebook managed its infrastructure only by leasing data center space from third-party providers.[2] However, the extensive growth of the number of Facebook users, led the company to design and build its own data centers. The first self-owned and operated data center was erected in 2011. In the following years, Facebook greatly enlarged its network infrastructure, opening several new data centers. Hardware and software used in the Facebook data centers are constantly updated in order to stay ahead of the storage needed to process the increasing amount of data, including the photos and the (possibly 360-degree) videos reproduced daily. The new data centers presented by Facebook are modular disaggregated systems. The benefit of disaggregated servers, storage and networking is evident when a new hardware is available, because, thanks to this kind of structure, it is possible to rapidly and easily replace hardware as well as software. In 2011, Facebook also launched the Open Compute Project (OCP), an industry-wide initiative whose aim is to share data center product innovations. Facebook network is composed of data centers as well as edge points of presence (POPs) and a global backbone. This article illustrates salient aspects of the data center domain, publicly shared by the company as part of its commitment to the OCP. [3]

Locations[edit]

Facebook owns data centers in the following locations.

United States:

Europe:

Architecture[edit]

Initially, data centers operated by Facebook were based on an hierarchical network architecture, known as 4-post Cluster design, which presented the company engineers with several issues. Hence, in 2014, Facebook started to migrate to a next-generation, high-performance architecture, referred to as the Fabric.

4-post Cluster design[edit]

In the 4-post Cluster architecture, machines are, as usual, organized into racks, connected to a top-of-the-rack switch, (RSW) via 10-Gbps Ethernet links. Such switches have up to forty-four 10G downlinks in addition to four or eight 10G uplinks (typically 10:1 oversubscription ratio), one to each aggregation switch, called cluster switch (CSW). A cluster is composed of a particular set of four CSWs, as well as the corresponding server racks and RSWs. The amount of machines per rack varies across clusters. Furthermore, machines in the same cluster may be devoted to the same purpose or not.[4] CSWs are connected to each other via another layer of aggregation switches. As a matter of fact, each CSW has a 40G uplinks (10G×4) per each “FatCat” (FC) switch (typically 4:1 oversubscription ratio). The four CSWs in each cluster are connected in an 80G protection ring (10G×8). Similarly, the FC switches are equipped with a 160G protection ring (10G×16). Intra-rack cables are SFP+ direct attach copper; otherwise MMF is used (10GBASE-SR).

Prior to the introduction of the 4-post cluster network design, network failures used to be one of the main causes of service interruptions. The robustness provided by the additional redundancy in the 4-post design has greatly limited such outages. Furthermore, the FC tier considerably reduced the traffic crossing between clusters. It must also be noted that growth can be simply managed by adding new clusters. Hence, on the one hand, the 4-post architecture has surely important desirable features. On the other hand, it has some relevant drawbacks. The most relevant ones are a consequence of the necessity of large clusters and therefore very large CSWs and FCs. First of all, an aggregation switch failure has clearly a remarkable impact. Specifically, a CSW failure reduces intra-cluster capacity by 25% and similarly, an FC failure reduces inter-cluster capacity by 25%. Furthermore, such large switches are only built by few sellers; they tend to be expensive, not easy to manage and debug as well as impossible to customize. They frequently have oversubscribed switching fabrics, meaning that not all the ports should be used simultaneously. Finally, it must be mentioned that the size of the clusters is determined by the one of the CSWs. The architecture results in a small number of massive clusters, complicating resource allocation operations. [5]

In light of these considerations, the company introduced a next generation architecture, which overcomes the problems of the cluster architecture, while retaining its best features.

The Fabric[edit]

Facebook rebuilt the idea of data center and moved from the old and hierarchical system of clusters to the idea of an entire building as an high-performance network composed by three levels.

“Keep it simple, stupid” is the principle the Facebook engineers embraced when building the Fabric. Despite the complexity of the whole system, the team tried to keep the principal components as simple and robust as possible. In fact, to build this new kind of structure, the company worked in an opposite way with respect to what done before: instead of large devices and clusters, they broke up the whole network into smaller pieces called server pods and established uniform high-performance connectivity between them. A pod is like a layer-3 micro-cluster aggregating four fabric switches (less than in the clusters) and each of the 48 Top-of-Rack (TOR) switches has 4 x 40G (upgradable) uplinks that provides up to 160G bandwidth capacity for a rack of just 10G connected servers. This new structure allows a smaller port density and a simpler and modular architecture. The modularity is the greatest advantage of the fabric with respect to the clusters because, having many different paths between servers, it allows the new architecture to be much more robust to incidents, crashes and simultaneous failures. The structure is composed by four independent planes of 48 spine switches that perform forwarding, together with the pods, creating this modular network topology able to have hundreds of thousands of 10G connected servers.

Furthermore, the structure communicates with the outside through a flexible number of pods, each one of them is capable of providing 7.68 Tbps to backend inter-buildings fabrics on the data center. The modular structure is revolutionary because it can be changed depending on the demands. For instance, if more compute capacity is needed, server pods can be added. If, instead, more intra-fabric capacity is desired, spine switches, each one with equal performance, can be included. Similarly, edge pods can be introduced, in order to obtain more extra-fabric connectivity. Such necessities of could not be easily satisfied when using a cluster-based system, while the fabric allows the company to react properly at any need.[6]

Technology[edit]

Introducing the fabric, Facebook adopted a “top-down” approach, first thinking at the overall network and then developing the individual devices needed. In particular, relevant elements are the following:

  • Routing protocol used: BGP4.
  • Network type (from TOR uplinks to edges): layer 3, support for both IPv4 and IPv6.
  • Behavior in case of heavy traffic: equal cost multi path routing (ECMP) with flow based hashing, multi speed: 40G links between the switches and 10G ports on the TORs.[7]

Physical infrastructure[edit]

Although the presence in the fabric of thousands and thousands of fiber strands, the physical structure is not complex.

The physical topology is composed by a data hall Y MDF, connected to the center of the building, two BDF rooms each one with two spine planes and two backbones (positioned above the fabric, at the core of the building, in order to use short vertical trunks). Then, the BDFs are linked to another data hall X MDF which connects everything to the TOR switch. A data hall standpoint is very similar to a server rack but it is connected to the BDF as soon as they are built and this is allows the company to considerably reduce the network deployment time. In fact, the connections between different parts of the building are simple and the whole complexity of the fabric is located inside the BDFs: here, each spine plane is considered as a failure domain that can be safely taken out of service whenever the company wants, without impacting the production. In the end, cabling is located in each spine plane and port layouts are repetitive, with their maps generated directly and automatically by Facebook software.

This kind of data center, first built in Altoona, Iowa, provides an example of how networking requirements may influence the design of the whole building and reduce the construction time.[8]

Hardware innovations[edit]

Over the years, Facebook has developed and shared several innovative data center hardware components.

Storage platform[edit]

At the Open Compute Summit 2013, Facebook showed off the Open Vault, developed with the design partner Wiwynn, that is a JBOD (just a bunch of disks storage) array. The Open Vault, also known as Knox, is a cold storage that uses a shingled magnetic recording in order to increase the capacity of archiving objects such as photos and video. Each rack can process 2 kilowatts of power, housing up to 2 PB of storage. This kind of cold storage data center represented an innovation also in terms of expenditure since it reduced the costs at one third.[9] [10] In 2015 the Honey Badger was added to the Open Vault Storage and transformed the Knox into a light weight storage server. The Honey Badger’s modularized design allowed to easily upgrade the computing power as well as the storage. In 2016, the policy of disaggregating the hardware and the software devices within a data center, in an attempt to maximize the amount of flash for applications, while minimizing the number of hardware and software components, led Facebook to introduce Lightning NVMe. The Lightning represents the first NVMe JBOF (just a bunch of flash) array. The Lightning NVMe has been added to the previous Open Vault and among its features, the most important are that it is able to support a large variety of SSD and to support multiple switch configurations, making it possible to use different SSD configurations.[11]

In order to optimize more and more the storage server design and stay ahead of increasing workloads, in 2017 Facebook started the design of a new high-density storage device, Bryce Canyon. It improves the performance of the Open Vault by 20 percent. It has been created to face the increasing amount of video that need to be stored. The Bryan Canyon includes slots NVMe and SSDs for caching and metadata acceleration. This groundbreaking storage supports more powerful processors and more memory. Furthermore the Bryan Canyon, taking in air underneath the chassis, leads to a thermal and power efficiency improvement. [12]

Networking devices[edit]

In 2015, Facebook started to develop its own networking hardware. The Wedge (16 x 40 GB) TOR switch was Facebook’s first incursion into designing networking devices. It isolates device management from the switching technology, increasing flexibility. Furthermore, it incorporates the OCP micro server which allows to take advantage of the so-called Group Hug architecture for motherboards.[13] Since a 40 Gbps switches turned out to be not adequate for scaling purposes, the technology was updated in the following years. In 2016, Wedge 100 specification has been accepted into the OCP. It consists of a 32x100G switch that can handle 100 Gbps per port. [14]. One year later, Facebook shared, via the OCP, Wedge 100S (32x100G). It presents several advantages over its predecessor, including: improved system thermal design, increased link speed and compute density as well as a lower oversubscription ratio. [15] In 2015, a modular switch platform, called 6-pack has been introduced. However, it was replaced, the following year, by a second-generation modular switch, named Backpack. It has a fully disaggregated architecture, implementing a complete separation of the data, control and management planes. [16] The company also deployed and shared CWDM4-OCP, a single-mode 100G optical transceiver, optimized for large scale data center.[17]

Software[edit]

In a Facebook data center, most machines have a specific role.[18] Facebook Web servers run Linux and Apache. Contents to be delivered to users are built in the PHP server-side scripting and general purpose programming language and compiled by means of HipHop for PHP (HPHPc), a discontinued PHP transpiler created by Facebook engineers, which converts PHP into C++ code. In addition, complex core applications are developed using other programming languages, such as C++, Java and Python. The company created Thrift, an application framework which allows the cooperation of programs written in different languages. Furthermore, in 2014, the company introduced a new programming language, called Hack, intended to simplify the programmers job.[19] MySQL servers (DB) store user data. However, given the large amount of information Facebook handles every day, they are not enough. Hence, data are temporarily stored in caching servers, running Linux operating system and Memcached, which is an open-source implementation of an in-memory hash table.[20] While there are a number of other roles, Web, MySQL and caching servers represent a large portion of the total number of machines.

Other software packages used by Facebook, include:

  • Haystack: an high-performance photo storage and retrieval system.
  • BigPipe: a dynamic web page serving system, exploited for performance optimization purposes.
  • Apache HBase: a non relational database, used for Inbox search. It replaced Apache Cassandra.[21]
  • Scribe: a flexible logging system, serving several purposes.
  • Apache Hadoop: a framework that allows for the distributed processing of large amount of data across clusters of machines.
  • Apache Hive: an SQL-like interface for Hadoop.
  • Corona: a scheduling framework which improves the Apache Hadoop MapReduce implementation efficiency, scalability and availability, by separating cluster resource management from job coordination.[22]
  • Varnish : an HTTP accelerator, operating as a load balancer and cache.

Except for Apache Hadoop, HBase and Varnish, the listed software packages were developed by Facebook. [23]. Furthermore, excluding Haystack and BigPipe, they all have open-source license.

Sustainability[edit]

The first data center built by Facebook, in Prineville, Oregon, was 38% more energy efficient, cost 24% less to build and run, and used 50% less water than a traditional one. Facebook facilities in Luleå, Sweden, and in Altoona, Iowa, are powered by 100% renewable energy, hydro in Luleå and wind in Altoona. The not conventional cooling systems also plays a role in reducing Facebook carbon footprint. [24] In 2015, the aim of attaining 25% clean and renewable energy in the data center electricity supply mix, was achieved. Afterwards, the company committed to double this percentage by the end of 2018.[25] Facebook has been the first major internet company to commit to reaching 100% renewable energy across its network infrastructure, followed by other large internet companies, such as Apple. [26] Facebook collaborates with important partners, including Greenpeace [27] as well as wind and solar-farm developers. The data center online by 2020, in Papillion, Nebraska, provides an example of this. Mark Zuckerberg’s company is working closely with the power utility Omaha Public Power District (OPPD) to source energy from wind farms that are not far from Papillion. Facebook is also part of the Renewable Energy Buyers Alliance (REBA), an organization of large power users that collaborate with suppliers and policymakers in order to find solutions based on renewable-energy.[28]

Cooling[edit]

The cooling system in a data center is necessary in order to prevent server overheating. One key to Facebook’s success has been its cooling strategy. Facebook has traditionally built the majority of its facilities in cooler regions where the environmental conditions guarantee its servers and systems can cooled for a minimal cost. Recently, however, a new strategy has been developed which can efficiently handle different and more extreme environmental circumstances.

Direct Cooling (Penthouse)[edit]

The cool climate in Prineville, where the first self-owned data center is located, allows Facebook to operate without chillers, needed in order to refrigerate water, but completely relying on outside air. On hot days, this data center is designed to use evaporative cooling. In its cooling design, Facebook opted for a two-tier structure, separating the servers and cooling infrastructure. The upper floor of the facility is designated to administer the cooling supply. Cool air penetrates the data hall from overhead, exploiting the physical phenomenon according to which cold air tend to fall and hot air to rise. In this way, the need to use air pressure to force cool air up through a raised floor, is eliminated. The air passes through a mixing room, where cold winter air and server exhaust heat are combined so as to regulate the temperature. Afterwards it goes through a sequence of air filters and a misting chamber where a fine spray is applied to further regulate the temperature as well as the humidity. The air continues its path through another filter absorbing the mist and hence through a fan wall pushing the air into the racks area. The described direct cooling system has been updated in an effort to lower the cost and improve the efficiency. In particular, the misting chamber has been replaced by evaporative media.[29] Moreover, the design has been adapted to several subsequent data centers built by Facebook. [30]. For instance, in Luleå, Sweden, near the Artic Circle, winter averages -20°C (-4°F) and freezing air from outside can surely be exploited. [31] This approach drastically reduces the energy consumption and thus operating expenses with respect to more traditional cooling systems. However, it is tailored to specific climate regimes.

Indirect Cooling (SPLC)[edit]

In 2018, Facebook announced the development, in partnership with Nortek Air Solutions, of a new evaporative method for cooling its data centers that is based on the idea of taking advantage of water in order to produce cool air for server rooms. The work began in 2015. The system could allow the company to build data centers efficient in terms of water and energy in places where direct cooling systems are not suitable. The new technology is called StatePoint Liquid Cooling (SPLC). The SPLC unit is installed on top of Facebook’s data centers and moves chilled water down into the facility beside the data hall, where it is turned into fresh air, thus cooling the servers. As a consequence, the water becomes warm and it is sent back to the SPLC unit, where it is cooled again by the membrane layer. Subsequently, the cycle begins again. It is possible to choose between three different operating modes depending on the external conditions. They enforce different procedures in order to obtain cold water. Specifically, when outside temperatures are adequately low, the system directly uses outside cold air. On the other hand, when temperatures rise, The SPLC operates in adiabatic mode, based on a heat exchanger and a recovery coil. Finally, in remarkably hot and humid climates, the system is set in super-evaporative mode that relies on a pre-cooling coil. The novel method is the most efficient in cooler locations, but it is efficient enough to be used in hot and humid climates as well as areas characterized by high levels of dust or elevated salinity, with reasonable power consumption. After multiple tests, the company believes SPLC will allow it to reduce water consumption, with respect to existing indirect cooling systems, by 20 percent in hot climates and by as much as 90 percent in cooler climates.[32] Facebook thermal engineer Veerendra Mulay declared that the SPLC provides a wide range of new opportunities, but his company will continue to use its traditional direct cooling technology in most of its data centers.[33] Moreover, he remarked that, as Nortek owns a patent for the SPLC system, a licence can be granted in order to enable other companies to take advantage of the technology.[34]

References[edit]

  1. "Alexa Top 500 Global Sites". www.alexa.com.
  2. "The Facebook Data Center FAQ". datacenterknowledge.com. 21 July 2017. Retrieved 1 November 2018.
  3. "Facebook Updates Data Centers with New OCP Hardware".
  4. https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p123.pdf
  5. http://nathanfarrington.com/papers/facebook-oic13.pdf
  6. "Facebook Fabric: An innovative network topology for data centers".
  7. Condon, Stephanie. "Facebook explains Fabric Aggregator, its distributed network system - ZDNet".
  8. "What is a Data Center Fabric? - Definition -".
  9. "Facebook Loads Up Innovative Cold Storage Datacenter". 25 October 2013.
  10. "Facebook puts some brains in Open Vault JBOD storage".
  11. "Facebook Debuts Data Center Fabric Aggregator". 20 March 2018.
  12. "Facebook Refreshes Its Server Hardware Fleet - StorageReview.com - Storage Reviews". www.storagereview.com. 8 March 2017.
  13. "Facebook's Wedge: A novel approach to Top-of-Rack switches". techrepublic.com. Retrieved 1 November 2018.
  14. King, Rachel. "Facebook is designing a new version of its Wedge network switch - ZDNet". zdnet.com. Retrieved 1 November 2018.
  15. "ownCloud". files.opencompute.org. Retrieved 1 November 2018.
  16. https://www.networkcomputing.com/data-centers/facebook-debuts-backpack-switch-platform/326143808
  17. "ownCloud". files.opencompute.org. Retrieved 1 November 2018.
  18. https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p123.pdf
  19. "Facebook Introduces 'Hack,' the Programming Language of the Future". wired.com. Retrieved 1 November 2018.
  20. Zeichick, Alan. "How Facebook Works". technologyreview.com. Retrieved 1 November 2018.
  21. "Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month - High Scalability -". highscalability.com. Retrieved 1 November 2018.
  22. Harris, Derrick (8 November 2012). "Facebook open sources Corona — a better way to do webscale Hadoop". gigaom.com. Retrieved 1 November 2018.
  23. "Exploring the software behind Facebook, the world's largest site - Pingdom Royal". pingdom.com. 18 June 2010. Retrieved 1 November 2018.
  24. Kepes, Ben. "It's Not So Complicated--Facebook And Sustainability".
  25. "Switch renewable energy remains firmly on - AXA IM Global". www.axa-im.com.
  26. https://www.greenpeace.org/usa/wp-content/uploads/legacy/Global/usa/planet3/PDFs/2015ClickingClean.pdf
  27. https://www.greenpeace.org/archive-international/Global/international/publications/climate/2011/Cool%20IT/Facebook/Facebook_Statement.pdf
  28. "Facebook's Quest for Clean Energy and a Greener Grid". americanbuildersquarterly.com.
  29. "Facebook Revises its Data Center Cooling System". datacenterknowledge.com. 16 July 2012. Retrieved 1 November 2018.
  30. "The Facebook Data Center FAQ. (Page 4)". datacenterknowledge.com. 21 July 2017. Retrieved 1 November 2018.
  31. Harding, Luke (25 September 2015). "The node pole: inside Facebook's Swedish hub near the Arctic Circle". the Guardian. Retrieved 1 November 2018.
  32. "Facebook's new cooling system might make it easier to put data centers in hot places". geekwire.com. 5 June 2018. Retrieved 1 November 2018.
  33. "Facebook has built a new data center cooling system for hot, arid climates - SiliconANGLE". siliconangle.com. 5 June 2018. Retrieved 1 November 2018.
  34. "Facebook's New Data Center Cooling Design Means It Can Build in More Places". datacenterknowledge.com. 5 June 2018. Retrieved 1 November 2018.

External links[edit]


This article "Facebook data centers" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Facebook data centers. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.