The Future of Networking & Telecom
Exploring themes and trends from the industries that revolve around the transfer of data and highlighting relevant companies within them.
Executive Summary
Overview of Article
Introduction
History
AI Bottlenecks - Inter GPU Bandwidth
NVLink & Infiniband, Google’s Data Center Efforts
ISPs / Carriers, Starlink
Edge Computing, Dojo
Networking Companies & Component Providers
Data Centers
Conclusion
One goal of these thematic pieces is to provide sufficient context for a generalist to get up to speed with what’s happening in an industry or with a trend while also hopefully providing some useful information that someone close to the industry would appreciate. These longform pieces are the result of my own personal research into industries and trends with the goal of approaching them like a scientist might, hoping to form hypotheses and test them in the market in hopes that we may get ever closer to discovering what may be “inevitable” in the future. Even then we must conduct careful analysis because as history has shown, just discovering a prominent trend isn't enough, you need to understand the underlying mechanics in order to grasp which companies are truly poised to benefit. For instance, if someone had told you in the 1990s the number of cellphones that would be in use in 2023, you might have bought Ericsson and Nokia. The purpose of this research is to arm investors with a framework for thinking about companies, industries and trends and hopes to over time, develop a process for searching for what might be inevitable. I aim to write the piece I wish that I had when doing research and ramping up on a new industry and while this is generally a high level overview of the different industries (and companies) within the broader networking/telecom sector, I plan on exploring these topics individually, in a more in depth manner in the future.
Introduction
Before computers, the internet or even telephones, sending information in various forms has been a core part of human existence. In this piece, we will take a look at the current state of various industries and companies within them that traffic in the transfer of data. We’ll explore some of the history of the networking and telecommunications industry, examine the current state of the industry and try to get a sense of where companies are investing their capital with the hope of discovering what might be “inevitable” in the future. We’ll take a look at AI bottlenecks (inter GPU bandwidth) and other current networking challenges and investments ,Edge Computing, Internet Service Providers & Carriers, Networking Companies, Data Centers, and Component Providers amongst other topics and highlight some relevant companies in each sector.
History
These are some of the major scientific and technological breakthroughs that paved the way for the modern computer networking and telecommunications industries:
1. 1837 - Telegraph and Morse Code:
Samuel Morse and Alfred Vail developed the telegraph system, which allowed messages to be transmitted over long distances using electrical signals. Morse code, a system of dots and dashes, was used to encode and decode messages.
2. 1876 - Telephone Invention:
Alexander Graham Bell and Elisha Gray independently developed the telephone, enabling voice communication over electrical wires. Bell is widely credited with being the first to patent the telephone.
3. 1895 - Wireless Telegraphy (Radio):
Guglielmo Marconi conducted experiments in wireless telegraphy, which led to the development of radio communication. He sent the first radio signal across the English Channel in 1899.
4. 1920s - Early Broadcasting:
The 1920s saw the beginning of radio broadcasting, allowing people to receive news, music, and entertainment over the airwaves.
5. 1960s - ARPANET and Packet Switching:
The Advanced Research Projects Agency Network (ARPANET) was created by the U.S. Department of Defense. It used packet switching, a method of breaking down data into packets for efficient transmission. ARPANET laid the foundation for the modern internet.
6. 1973 - Ethernet Invention:
Robert Metcalfe and his team at Xerox PARC invented Ethernet, a technology for connecting computers in local networks. Ethernet became the standard for wired local area networks (LANs).
7. 1983 - TCP/IP Standardization:
The Transmission Control Protocol (TCP) and Internet Protocol (IP) were standardized, providing a common language for computers to communicate over networks. This was crucial for the growth of the internet.
8. Late 1980s - World Wide Web:
Tim Berners-Lee developed the World Wide Web, a system of interconnected hypertext documents accessed via the internet. This marked the beginning of user-friendly internet browsing.
9. 1990s - Commercialization of the Internet:
The 1990s saw a surge in internet adoption and the rise of commercial internet service providers (ISPs). This period marked the transition of the internet from a research tool to a global communication and commerce platform.
10. Late 1990s - Mobile Communication and GSM:
The Global System for Mobile Communications (GSM) standard enabled digital mobile communication. This paved the way for the widespread use of cell phones and text messaging.
11. 2000s - Broadband Internet and Wi-Fi:
Broadband internet, offering high-speed connectivity, became widely available. Wi-Fi technology allowed wireless internet access within short ranges, transforming how people connected to the internet.
12. 2010s - 4G and Smartphones:
The deployment of 4G (fourth-generation) mobile networks enabled faster data speeds and facilitated the widespread use of smartphones, which became essential devices for communication and accessing online services.
13. 2020s - 5G and IoT:
The rollout of 5G (fifth-generation) networks began, promising even higher data speeds and lower latency. This is expected to support innovations like the Internet of Things (IoT), where various devices are interconnected for data sharing and automation.
These breakthroughs, spanning over centuries, collectively shaped the computer networking and telecommunications industries into what we know today, enabling seamless global communication, information sharing, and technological advancements.
Inevitably, AI training is constrained by inter GPU bandwidth
One area of interest in modern networking and telecommunications is how the training of large AI models is inevitably constrained by inter GPU bandwidth. Say you’re a new AI startup who has just been funded and you plan on training a foundation model. You spin up some ND H100 virtual machine instances on Azure to train your mode, which is 8 H100s initially but can scale up to thousands with an upper limit of 3.2 Tb/s of interconnect bandwidth per virtual machine (each GPU in a v5 instance has a 400 Gb/s Infiniband connection.) As one of Microsoft’s flagship cloud offerings for AI, we can understand how the upper limit of bandwidth in these cloud instances represents an inevitable bottleneck for training AI models and an area of interest for the companies participating in this arms race to invest in.
AI training involves training large neural networks on vast amounts of data to learn patterns and make accurate predictions or classifications. This process is computationally intensive and requires significant processing power. Graphics Processing Units (GPUs) are commonly used for this purpose due to their parallel processing capabilities, which allow them to handle the massive amounts of matrix calculations involved in neural network training.
When training neural networks, data is divided into batches, and these batches are processed simultaneously on the GPU's many cores. However, during this process, there is a need for frequent communication between the cores or GPUs. This communication involves exchanging intermediate results, gradients, weights, and other information to update the model effectively. This communication happens through what's called the inter-GPU communication or inter-GPU bandwidth.
Inter-GPU bandwidth refers to the speed at which data can be transferred between different GPUs within a system. If this bandwidth is low, it can lead to performance bottlenecks and slower training times. Here's why inter-GPU bandwidth is a constraint in AI training:
1. Data Parallelism: In many AI training scenarios, a technique called data parallelism is used. This involves splitting the dataset into multiple parts and processing each part on a separate GPU simultaneously. The GPUs then need to communicate their updates to the model's parameters and gradients. If the inter-GPU bandwidth is limited, this communication can slow down the overall training process.
2. Gradient Descent: Neural networks are typically trained using optimization algorithms like stochastic gradient descent. In each iteration, gradients are computed with respect to the loss function for each batch of data. These gradients need to be synchronized across GPUs to update the model's parameters properly. A slow inter-GPU communication can delay this synchronization and increase the time it takes to reach convergence.
3. Model Parallelism: In certain cases, especially when dealing with very large models, model parallelism might be employed. This involves dividing the neural network itself into segments and distributing these segments across different GPUs. Again, efficient communication between these GPUs is essential for synchronization and proper functioning of the model.
4. Large Models: Deep learning models are becoming increasingly large and complex, with billions of parameters. Training these models involves even more data exchange between GPUs, putting additional strain on inter-GPU bandwidth.
5. Multi-Node Training: In distributed training scenarios, where multiple machines are connected to work together, communication between nodes is crucial. Inter-GPU bandwidth becomes inter-node bandwidth in this case. Slow communication between nodes can lead to poor scaling and reduced efficiency.
Efficient communication between GPUs is essential for maintaining parallel processing and avoiding idle times where GPUs are waiting for data from others. Researchers and engineers often strive to optimize inter-GPU communication, either through hardware improvements or software techniques, to minimize the impact of this constraint on AI training speed and efficiency.
Considering the arms race that is AI training, companies providing the “picks and shovels” are investing heavily in innovating in the space. Nvidia, for instance, is investing in further developing its Infiniband and NVlink technologies.
NVIDIA's InfiniBand and NVLink are two different technologies that are used to improve interconnectivity and communication between GPUs and other components in high-performance computing (HPC) and data-intensive applications. Both technologies aim to provide high-speed and low-latency connections, but they have different design principles and use cases.
1. InfiniBand:
InfiniBand is a high-speed networking technology designed for connecting servers and high-performance computing systems. It's not exclusive to NVIDIA and is used across the industry. InfiniBand provides a hh-speed, low-latency interconnect solution that is commonly used in cluster computing and supercomputing environments.
Key features of InfiniBand:
High Bandwidth: InfiniBand supports very high data transfer rates, ranging from gigabits per second (Gbps) to terabits per second (Tbps), depending on the specific implementation.
Low Latency: InfiniBand offers low communication latency, which is crucial for applications that require rapid data exchange, such as scientific simulations and real-time analytics.
Message Passing Interface (MPI) Support: Many high-performance applications, especially those used in scientific research, utilize the MPI standard for distributed computing. InfiniBand provides efficient support for MPI communication.
Clustering and Supercomputing: InfiniBand is often used in cluster computing and supercomputing environments to connect multiple nodes and GPUs for high-performance computations.
2. NVLink:
NVLink is a proprietary high-speed interconnect technology developed by NVIDIA specifically for connecting their GPUs together. It's designed to provide faster and more efficient communication between GPUs than traditional PCIe (Peripheral Component Interconnect Express) connections.
Key features of NVLink:
High Bandwidth and Low Latency: NVLink offers higher bandwidth and lower latency compared to traditional PCIe connections. This is particularly beneficial for scenarios where GPUs need to exchange a significant amount of data, such as in deep learning model training.
Direct GPU-to-GPU Communication: NVLink allows GPUs to communicate directly with each other without passing through the CPU or host system, reducing communication bottlenecks.
Unified Memory Space: NVLink enables GPUs to share memory more seamlessly, making it easier to distribute and manage data across multiple GPUs.
Multi-GPU Scalability: NVLink facilitates building large-scale multi-GPU systems for tasks like AI training, simulation, and scientific computing.
Data-Intensive Workloads: NVLink is particularly advantageous for workloads that involve heavy data exchange between GPUs, such as in deep learning training and complex simulations.
Both InfiniBand and NVLink technologies are designed to improve interconnectivity and communication between components in high-performance computing environments. InfiniBand is a general-purpose networking technology used for connecting servers and clusters, while NVLink is NVIDIA's proprietary technology optimized for connecting GPUs together with a focus on high-speed and low-latency communication, making it especially valuable for data-intensive tasks like deep learning and scientific simulations. Infiniband (a Mellanox product, a company that Nvidia acquired in 2019 for $6.9B) is designed for somewhat longer distances than NVLink as in any physical medium, there’s a natural and inevitable trade off between bandwidth and length. Whether to use copper or fiber is a critical point that researchers in the industry are focused on and being able to put optics actually “on the board” is a sort of holy grail because not only would it create incredibly fast speeds compared to copper, but it would also reduce power consumption per bit (saving energy in data centers) and expand bandwidth by achieving higher channel densities. On board optics is still extremely expensive to produce but it is certainly an area worth watching (an perhaps a topic for a follow up piece), especially as the AI arms race accelerates investment in the space, as it would be an extraordinary (and potentially revolutionary) leap forward for optical technology.
Google’s Project Apollo
Google has been investing heavily in building custom data center networking infrastructure (Project Apollo) to support its massive scale of operations and to provide the necessary performance, reliability, and efficiency for its services. One of the key components of Google's data center networking approach is its use of software-defined networking (SDN) principles (something that Arista Networks is also using to win market share). Here's an overview of Google's efforts:
1. Jupiter Network Architecture:
Google's custom networking infrastructure is based on a design called the "Jupiter" network architecture. The Jupiter architecture is designed to handle the enormous amount of data traffic that Google's services generate. It employs a two-tier Clos network topology that provides high bisection bandwidth and minimizes the number of hops data needs to travel between servers.
2. Software-Defined Networking (SDN):
Google heavily relies on SDN principles to manage and control its data center networks. SDN separates the network's control plane from the data plane, enabling centralized control and dynamic management of network resources. Google uses an SDN controller to dynamically allocate and manage network resources based on traffic patterns and demand.
3. B4 Network Infrastructure:
B4 is Google's global backbone network that connects its data centers around the world. It uses high-capacity links to provide low-latency and high-speed connectivity between data centers. The B4 network is designed to handle both user-facing traffic and backend communication between Google's services.
4. Andromeda Virtual Network Stack:
Google developed the Andromeda network virtualization stack to provide networking capabilities to virtual machines (VMs) running on Google Cloud Platform. Andromeda allows Google to offer features like software-defined firewalls, load balancing, and network performance monitoring to its cloud customers.
5. Custom Networking Hardware:
Google designs its own networking hardware, including switches and routers, to match its specific requirements. By designing its hardware, Google can optimize for performance, energy efficiency, and cost-effectiveness based on its data center workloads. Specifically, they are designing optical circuit switches to replace the traditional electronic packet switches they would buy from Broadcom.
6. OpenFlow and OpenConfig:
Google has been involved in initiatives like OpenFlow and OpenConfig, which aim to standardize and open up network control and configuration protocols. OpenConfig, for instance, provides vendor-neutral network configuration models that align with Google's SDN approach.
7. Data Center Interconnect (DCI) Innovations:
Google has explored innovative ways to interconnect its data centers, including using technologies like optical networking for long-distance, high-capacity connections between data centers. This helps Google maintain high-speed, low-latency communication between its data centers.
8. Network Resilience and Redundancy:
Google's network architecture is designed for redundancy and fault tolerance. It incorporates mechanisms to handle hardware failures, reroute traffic, and maintain service availability even in the face of network disruptions.
Google's efforts to build custom data center networking infrastructure align with its need to deliver reliable and performant services to its users, whether it's search, cloud computing, or other services. These efforts allow Google to have more control over its network, optimize it for its workloads, and ensure that its services are delivered with the quality and scale required by its global user base. Google claims they are able to save $3B vs comparable data center infrastructures and also that their custom network improves throughput by 30%, uses 40% less power, incurs 30% less Capex, reduces flow completion by 10%, and delivers 50x less downtime across their network. It also eliminates the need for them to purchase Broadcom switches for the spine of their networking stack. Google is using this network infrastructure to train its advanced large language models. Google is using internally developed optical circuit switches to replace the traditional electronic packet switches as they offer lower latency (due to not needing to decode packets), ease of upgrading (you no longer need to replace the entire spine) and interoperability between optical switches of different speeds as well as lower power consumption.
There is a fantastic semianalysis article that covers this in more detail.
Relevant Companies
Nvidia
Microsoft
Google
Amazon
Meta
Arista
Broadcom
Cisco
Juniper
Marvell
OpenAI (Private)
ISPs & Carriers - 5G, Fiber, FWA, Streaming
Most people are probably familiar with the process of signing up for a mobile plan and home internet. They have essentially become requirements for participating in the modern world (a trend that was accelerated by work from home policies during Covid.) Historically, there was greater delineation between the companies that provided your mobile internet and the companies that provide your cable and home internet. However, as consumer preferences shift, the competition for providing internet access and video entertainment services has heated up and companies are scrambling to innovate to not be left behind. There are some key trends happening in the telecom space
Fixed wireless access vs fiber
Bundling wireless and home internet (somewhat of a race to zero)
Streaming and video becoming the main traffic over the internet
Telecom companies divesting media assets (AT&T spinning out Warner Brothers) and a sort of war between the cable providers and the media companies (some of whom are also entering the streaming game, Charter vs Disney, for instance. Also perhaps relevant from a few years back was SK Telecom vs Netflix, when Netflix traffic surged due to people watching Squid Game, SK Telecom was on the hook for the increase in cost. Netflix (Media/Tech Company) benefitted at the expense of SK Telecom (Internet Service Provider.)
Alternative Internet Access solutions such as SpaceX’s Starlink
The story of the current telecommunications industry seems to be one where consumers are benefitting at the expense of companies margins as cable companies and wireless carriers are engaged in somewhat of a “race to zero,” as they fight one another to offer lower bundled services to consumers in the face of declining average revenue per user (ARPUs.) There are battles on many fronts - cable companies and wireless providers are trying to win customers from each other by bundling home and mobile internet services, cable providers are battling with media companies (most of whom have created their own streaming service, the profitability of which remains to be seen as at the current moment Netflix appears to be the leader in running a profitable streaming company. Charter vs Disney is the most recent example, SK Telecom vs Netflix is another relevant example.) In the Chartr vs Disney Battle, Disney pulled its programming from Charter’s Spectrum TV service the day before college football started. While this causes consumers to get mad at the cable provider (Charter) all the cable company does is provide the programming and pass through the costs. Historically this has been profitable for both entities but as media owners have raised prices, it has eaten away at the cable companies margins. Charter finally had enough and doesn’t want to negotiate with Disney on their terms, which is potentially bad timing for Disney as they are also looking to divest their major media asset (ESPN) rumored to be worth as much as $40B (Disney’s market cap is ~$150B). It will be interesting to see how this plays out as it may come to be influential in the future of both the cable and media industries.
Also, with video now representing 65% of internet traffic (Streaming, TikTok, Youtube etc.) and the rise of mixed & virtual reality (Meta Quest, Apple Vision Pro) how will the demand for ultra fast internet speeds inevitably shape how telecom companies choose to invest in their infrastructure. One of the interesting battles on the technology side of things is the race to build more internet access capacity. Some companies are going the route of building and delivering Fixed Wireless Access while others are investing in Fiber Optics to deliver internet services. After that we’ll also briefly explore SpaceX’s ambitious attempt to deliver the internet via Starlink, their low earth orbit satellite constellation.
FWA vs Fiber
Fixed Wireless Access:
Fixed wireless access (FWA) delivers internet connectivity using wireless signals, typically utilizing radio frequencies. A fixed antenna on your property communicates with a nearby base station to establish a connection.
Pros:
1. Rapid Deployment: FWA can be set up relatively quickly compared to laying physical cables, making it an attractive option for areas where infrastructure development might be challenging or time-consuming.
2. Cost-Effective: Installing wireless infrastructure can be more affordable than laying extensive fiber optic cables, especially in rural or remote areas.
3. Scalability: FWA networks can be expanded easily by adding more base stations to cover larger areas or more customers.
4. No Physical Cables: No need to dig up streets or lay cables, which can be disruptive and costly in urban environments.
5. Decent Speeds: FWA can offer respectable speeds, particularly in areas with good signal quality and lower user density.
Cons:
1. Signal Interference: Wireless signals can be affected by weather conditions, obstructions, and other electronic devices, leading to potential fluctuations in performance.
2. Limited Bandwidth: Shared wireless spectrum can result in reduced performance during peak usage times, causing slower speeds.
3. Lower Speed Potential: FWA speeds might not match the blazing fast speeds achievable with fiber optics, especially over longer distances.
4. Line-of-Sight Required: In some cases, a clear line-of-sight between the antenna and base station is needed for optimal performance.
5. Security Concerns: Wireless signals are susceptible to interception, making them potentially less secure than fiber optic connections.
Fiber Optics:
Fiber optics uses thin strands of glass or plastic to transmit data as pulses of light. It offers extremely high-speed internet connectivity through dedicated physical cables.
Pros:
1. High Speeds: Fiber optics provides some of the fastest internet speeds available, making it ideal for activities like streaming, online gaming, and large file transfers.
2. Stability and Reliability: Fiber is less susceptible to environmental factors like interference or electromagnetic interference, leading to consistent and reliable performance.
3. Symmetrical Speeds: Fiber often offers symmetrical upload and download speeds, which is crucial for applications like video conferencing and cloud-based services.
4. Unlimited Bandwidth: Fiber networks have high bandwidth capacity, making them well-suited for households with multiple devices and high data demands.
5. Future-Proof: Fiber's capacity for very high speeds positions it well for future technological advancements and increasing data demands.
Cons:
1. Deployment Challenges: Laying fiber optic cables requires significant infrastructure development, making it expensive and time-consuming, especially in areas with existing urban infrastructure.
2. Installation Complexity: Fiber optic cables need careful installation, which might involve digging and disruptions to existing infrastructure.
3. Cost: Fiber optic internet can be more expensive due to the high upfront costs of building the infrastructure.
4. Limited Availability: Fiber optic networks might not be available in all areas, particularly in remote or less densely populated regions.
5. Vulnerability to Physical Damage: Since fiber cables are physical, they can be damaged by construction work or accidents, potentially leading to service disruptions.
In summary, fixed wireless access is advantageous for its quick deployment and cost-effectiveness, while fiber optics excels in offering incredibly high speeds, stability, and future-proof capabilities. The choice between the two depends on factors like your location, desired speed, budget, and availability of infrastructure. While Fixed Wireless Access has gained popularity as an option for accessing the internet during the era of 5G rollouts, it appears at the moment that customers are benefitting at the expense of the internet service providers (over $100B has been spent on FWA deployments with very little of the costs passed through to the end customers.) It remains to be seen if the unit economics of FWA are durable in the long run and this is definitely a theme that is worth watching and will be important to listen to what various management teams have to say with respect to the economic viability of continuing to build fixed wireless access internet services. As demand increases for video games with higher quality graphics (as well as trends like cloud based gaming and richer multiplayer experiences) as well as the rise of immersive mixed and virtual reality devices like the Meta Quest and Apple Vision Pro, it will be interesting to observe how consumer preferences shift and how internet service providers invest in order to provide the best experience at the most competitive prices. An interesting point to note when it comes to reviewing history with the goal of better understanding the future - over the last five years, it has been somewhat more profitable to own the companies that own the cell towers and other telecom infrastructure (American Tower, SBA Communications) than it has been to own the Telecom Companies themselves (AT&T & Verizon) with the notable exception of TMobile. (Comcast being somewhere in the middle.)
Starlink
SpaceX's "Starlink" project is an audacious endeavor to revolutionize global internet access. As we discuss this project, we'll also briefly explore the history of satellite internet attempts and the physical challenges inherent in delivering signals via satellites.
The dream of satellite-based internet connectivity has fascinated innovators for decades. Some background, key milestones and challenges that define this industry:
1. Early Experiments:
In the 1970s, experiments like the ALOHAnet demonstrated the feasibility of satellite-based data communication.
In the 1990s, Teledesic, a project backed by Bill Gates and Craig McCaw, aimed to create a global satellite-based internet network but faced funding and technical challenges.
2. Geostationary Satellites:
Traditional geostationary satellites orbit at 36,000 kilometers above the equator, providing global coverage but suffering from latency issues due to signal travel time.
3. Low Earth Orbit (LEO) Satellites:
LEO satellites, at altitudes of 160-2,000 kilometers, offer lower latency but require a vast constellation due to their lower coverage area.
4. Regulatory Hurdles:
Licensing, frequency allocation, and international coordination are complex challenges for global satellite internet projects.
5. Financial Viability:
High launch and satellite production costs have hindered satellite internet projects' economic feasibility.
SpaceX's Starlink project, launched in 2018, aims to address these historical challenges with a new approach:
1. LEO Satellite Constellation:
Starlink deploys thousands of small satellites in LEO, reducing latency and enabling faster data transmission. The system relies on satellite-to-satellite laser links for rapid data transfer within the constellation.
2. Cost Efficiency:
SpaceX manufactures satellites in-house, reducing production costs and reusable Falcon 9 rockets significantly cut launch expenses.
3. Regulatory Progress:
Starlink has secured regulatory approvals in numerous countries, streamlining the path to global coverage.
4. User Terminals:
Starlink user terminals (antennas) are compact and designed for easy installation. The phased-array antenna automatically aligns with the satellites, simplifying user experience.
5. Rapid Deployment:
Starlink's "Better Than Nothing Beta" began offering services in late 2020 to select regions, expanding rapidly to users worldwide.
6. Expanding Coverage:
The goal is to provide high-speed internet access to underserved and remote areas globally.
7. Constellation Growth:
SpaceX continues to launch batches of satellites, with plans to deploy thousands more in the coming years.
While Starlink represents a groundbreaking approach to satellite internet, it grapples with physical challenges:
1. Space Debris:
LEO satellites risk collisions with space debris, necessitating constant monitoring and collision avoidance maneuvers.
2. Satellite Lifespan:
LEO satellites have a limited lifespan due to atmospheric drag, requiring ongoing replacement.
3. Latency:
Although lower than geostationary satellites, LEO-based systems still face latency challenges compared to terrestrial networks.
4. Regulatory Compliance:
Navigating international regulations and spectrum allocation can be complex and time-consuming.
5. Competition:
Starlink competes with other satellite internet ventures and terrestrial broadband providers.
6. Environmental Impact:
The sheer number of satellite launches raises concerns about space debris and environmental effects.
SpaceX's Starlink project represents a remarkable leap forward in satellite-based internet access. By deploying a vast constellation of LEO satellites, addressing latency issues, and focusing on cost efficiency, Starlink aims to connect the unconnected and reshape global internet accessibility. Despite the physical challenges and regulatory hurdles, Starlink's progress promises to usher in a new era of connectivity, potentially bridging the digital divide for millions worldwide. As history has shown, the path to satellite internet innovation is marked by persistence and innovation, and Starlink is a compelling testament to this journey.
Relevant Companies
AT&T
Verizon
TMobile
Charter
Comcast
Netflix
Disney
SpaceX(Private)
American Tower
Crown Castle
SBA Communications
SK Telecom
China Mobile
Nippon Telegraph & Telephone
Vodafone
Deutsche Telekom
American Movil
China Telecom
Orange
Swisscom
Saudi Telecom
Singel
BCE
Telstra
(most countries have one or more big telecom providers)
Edge Computing
The “Edge” is where all these various companies meet - Hyperscalers, Communication Service Providers (carriers/teclos), infrastructure equipment providers and Edge Cloud Management Platforms amongst others. This confluence is driving lots of corporate spending and innovation around computing at the edge.
Imagine you're considering investing in the real estate market. You have two options: you can either invest in a massive, centralized skyscraper in the heart of the city, or you can invest in multiple strategically located townhouses across the city. Let's relate this analogy to the data center world and explain edge computing in a similar manner:
Edge computing is like investing in those strategically located townhouses. In the world of data centers and networking, it refers to a decentralized approach to processing and analyzing data. Instead of sending all the data to a central data center (the skyscraper), edge computing brings the data processing closer to where it's generated, right at the "edge" of the network – in this case, the townhouses.
Just as these townhouses are placed in key neighborhoods to serve the local residents efficiently, edge computing involves placing smaller computing resources – like servers and data processing devices – in proximity to where the data is being produced. This can be at the site of an industrial machine, a retail store, a connected vehicle, or even in smart home devices.
From an investor's perspective, edge computing has several advantages:
1. Reduced Latency: Since data doesn't need to travel all the way to a central data center and back, processing happens faster. This is crucial for applications that require real-time responses, like autonomous vehicles or industrial automation.
2. Bandwidth Efficiency: By processing data closer to the source, edge computing reduces the need to transmit large amounts of data over long distances. This can lead to cost savings on network infrastructure.
3. Data Privacy: Some data might be sensitive or subject to regulations, making it more secure to process and analyze at the edge rather than sending it to a distant central location.
4. Reliability: If there's a network outage, edge computing can still function locally, ensuring uninterrupted operation for critical systems.
5. Scalability: Edge computing allows for modular expansion. New "townhouses" can be added to the network easily as demand increases.
6. Diverse Applications: Edge computing benefits a wide range of industries, from manufacturing and healthcare to retail and entertainment, making it a versatile market to invest in.
Edge computing is significant in our research as it represents the convergence of networking/telecom/hyperscalers/internet companies etc. and is a key area to watch in the coming years.
“Billions of devices connect to the internet: smartphones, computers, security cameras, machine sensors, and many more. Devices like these generate massive amounts of data, most of which travels over the internet to applications running in the cloud. The cloud, in turn, is powered by enormous, centralized data centers and platforms operated and offered by a few organizations.The problem with this is that, as the number of connection points explodes to 150 billion devices generating 175 zettabytes of data by 2025, sending all that data to faraway clouds for processing will become increasingly inefficient and expensive. Moreover, this model may not be able to deliver the real-time data and response times demanded by newer applications. Consequently, more organizations are considering a hybrid cloud model that augments existing cloud strategies with edge computing. Edge computing distributes the cloud’s scalable and elastic computing capabilities closer to where devices generate and consume data. These locations can be as varied as an enterprise’s on-premise server, a communication service provider’s central office or cell tower, a hyperscaler’s regional data center, an end-user device, or any point in between.Since data doesn’t have to travel as far, using edge computing can help reduce network resources, cut transit costs, improve reliability, reduce latency, and, perhaps most importantly, enhance enterprise control over data and applications. For example, edge computing can help organizations meet increasingly stringent data sovereignty, privacy, and security requirements by keeping sensitive data on premise. What’s more, when edge computing is combined with advanced connectivity options—especially 5G—it can deliver flexible, near real-time response times for data-heavy, artificial intelligence–driven, time-sensitive, or mission-critical applications. The combination of low latency, advanced connectivity, and enhanced data control makes many IoT use cases, such as the video analytics and computer vision used in security and quality control, immersive mixed reality training, autonomous vehicles, and precision robotics, much more feasible.The developing edge computing ecosystem is highly diverse. While chipset makers, device manufacturers, application developers, security specialists, and system integrators also feature prominently, we’ll focus on four categories of companies that are active in the edge computing market: public cloud hyperscalers, communications service providers (CSPs), infrastructure equipment vendors, and cloud management platforms.” - Deloitte TMT Predictions 2023
Dojo: Custom Supercomputer Infrastructure Built For the Edge
Tesla's "Dojo" supercomputer project represents a pivotal step towards achieving the company's ambitious goals in autonomous driving and artificial intelligence.
Autonomous driving relies on the processing of vast amounts of data from sensors, cameras, and radar systems in real-time. The complexity of analyzing this data, making split-second decisions, and ensuring safety demands computing power beyond the capabilities of conventional computers. Tesla recognized this need and initiated the Dojo project to address it.
Understanding Tesla's Dojo Supercomputer
1. Purpose:
Dojo is designed to accelerate the training of Tesla's neural networks, which are the core of the company's Full Self-Driving (FSD) technology.
2. Impressive Specs:
Tesla aims to make Dojo one of the most powerful AI supercomputers in the world. It's expected to deliver exaflop-level computing, which translates to a quintillion calculations per second.
3. Custom Hardware:
Tesla is developing custom hardware, including AI accelerators and training chips, tailored for Dojo's specific requirements. This hardware is designed to optimize neural network training while minimizing power consumption. In a world where the demand for Nvidia GPUs greatly exceeds supply, this may prove to be a valuable investment over the long term.
4. Data Efficiency:
Dojo focuses on data efficiency, aiming to train neural networks with less labeled data, reducing the need for extensive manual labeling.
5. Neural Network Training:
The supercomputer is intended to significantly accelerate the training of neural networks, allowing for rapid iterations and improvements in Tesla's self-driving algorithms.
6. Potential Applications Beyond Autonomous Driving:
While Dojo's primary focus is on autonomous driving, its capabilities can extend to other AI-intensive applications, such as natural language processing and computer vision.
Significance in the Autonomous Driving Industry
Tesla's Dojo project holds immense significance for the automotive industry and beyond:
1. Competitive Advantage:
Dojo can potentially give Tesla a competitive edge in the race for autonomous driving supremacy by accelerating its development and deployment of FSD features.
2. Enhanced Safety:
Faster neural network training can contribute to safer autonomous vehicles by improving real-time decision-making and reaction times.
3. Cost-Efficiency:
Dojo's data efficiency could reduce the need for extensive data labeling, which can be a costly and time-consuming process.
4. Potential Revenue Streams:
Tesla could potentially leverage Dojo's capabilities for AI-related projects beyond automotive, creating additional revenue streams.
5. Industry Influence:
The success of Dojo could set industry standards and influence the development of AI supercomputers in autonomous driving.
Tesla's Dojo project faces several challenges, including the development of custom hardware, software optimization, and ensuring data privacy and security. Additionally, the practical implementation and integration of Dojo into Tesla's vehicles and infrastructure remain key milestones.
The future of the Dojo project holds the promise of faster advancements in autonomous driving technology. As Tesla continues to refine its neural networks and expands its FSD capabilities, Dojo's role as a critical enabler of self-driving cars is likely to become increasingly apparent.
Tesla's Dojo supercomputer project is an ambitious effort to reshape the landscape of autonomous driving and AI. By harnessing the power of exaflop-level computing, custom hardware, and data efficiency, Tesla aims to accelerate the development and deployment of Full Self-Driving technology. As the project matures and overcomes its challenges, it has the potential to not only elevate Tesla's position in the automotive industry but also influence the broader AI and autonomous driving landscape.
Relevant Companies
Amazon
Google
Microsoft
Tesla
Verizon
TMobile
Orange
SK Telecom
KDDI
Telenor
Telefonica
AT&T
Vodafone
Telstra
Dell
Nokia
Cisco
Ericsson
JMA Wireless
Mavenir
Hewlett Packard
Red Hat
VMWare
Nutanix
MobiledgeX
Amdocs
Component Providers
As is true in any industry, “where this is a gold rush, sell the picks and shovels” or to frame the quote from the lens of an investor - buy the companies that are providing the picks and shovels for the modern “gold rushes.” In terms of networking and telecommunications, there are many companies that provide the mission critical components that companies need in order to build the most advanced and capable network infrastructure.
Routers (core, edge, wireless, etc)
Switches
Wireless Access Points
Network Products
Transceivers
Optical Components
1. Routers:
Routers direct data packets between networks. They manage traffic, determine the best path for data, and enable communication between devices.
Companies: Cisco, Juniper Networks, Huawei, Arista Networks, NETGEAR.
2. Switches:
Switches connect devices within a local network (LAN). They forward data to the intended recipient, improving efficiency and reducing network congestion.
Companies: Cisco, Hewlett Packard, Juniper Networks, D-Link, Extreme Networks.
3. Access Points:
Access points (APs) provide wireless connectivity to devices within a certain range. They are used to create Wi-Fi networks.
Companies: Cisco, Aruba Networks (HPE), Ubiquiti Networks, Ruckus Networks (CommScope), TP-Link.
4. Modems:
Modems (modulator-demodulators) convert digital data from computers into analog signals for transmission over phone lines or cable systems, and vice versa.
Companies: ARRIS (CommScope), Motorola Solutions, NETGEAR, Technicolor.
5. Firewalls:
Firewalls protect networks from unauthorized access and potential threats by monitoring and controlling incoming and outgoing traffic.
Companies: Palo Alto Networks, Fortinet, Check Point Software, Cisco (ASA), SonicWall.
6. Load Balancers:
Load balancers distribute network traffic across multiple servers to prevent overloading, optimize resource use, and ensure high availability.
Companies: F5 Networks, Citrix Systems, Radware, Kemp Technologies.
7. Network Security Appliances:
These appliances provide advanced security functions such as intrusion prevention, malware detection, and content filtering.
Companies: Fortinet, Palo Alto Networks, Sophos, Cisco (FirePower), WatchGuard.
8. Transceivers:
Transceivers are modules that convert data into a format suitable for optical or electrical transmission over network cables.
Companies: Cisco, Juniper Networks, HPE (Aruba), Finisar (now part of Coherent), Broadcom.
9. Network Cables and Connectors:
Ethernet cables and connectors are used to physically connect devices within a network.
Companies: Belden, Panduit, CommScope, Black Box, Legrand.
10. Network Monitoring and Management Software:
Software solutions that monitor network performance, analyze data, and manage network devices.
Companies: SolarWinds, Cisco (Prime Infrastructure), PRTG Network Monitor, ManageEngine OpManager.
11. Telecommunication Infrastructure:
This includes equipment for mobile and fixed-line communication, such as base stations, antennas, and switches.
Companies: Ericsson, Nokia, Huawei, ZTE, Samsung Electronics.
12. Satellite Communication Equipment:
Equipment for communication via satellites, including satellite dishes, transponders, and ground stations.
Companies: Hughes Network Systems, ViaSat (Viasat), Gilat Satellite Networks, Iridium Communications, SpaceX(Starlink)
13. Optical Components
Optical components are essential parts of networking infrastructure that enable the transmission and reception of optical signals (light) in optical fiber communication systems. They play a crucial role in modern high-speed, long-distance, and high-capacity data transmission. Here are some key optical components used in networking infrastructure:
1. Optical Transceivers: Optical transceivers are integrated devices that transmit and receive optical signals. They convert electrical signals from network devices like switches and routers into optical signals for transmission over optical fibers and vice versa. Common types include SFP (Small Form-Factor Pluggable), QSFP (Quad Small Form-Factor Pluggable), and CFP (C form-factor pluggable) transceivers.
2. Optical Amplifiers: Optical amplifiers boost the power of optical signals without converting them to electrical signals. This is essential for long-distance optical transmission where signals may attenuate (weaken) over the fiber. Erbium-Doped Fiber Amplifiers (EDFAs) are a common type used in long-haul networks.
3. Optical Splitters and Combiners: Optical splitters divide an optical signal into multiple signals, while optical combiners merge multiple signals into one. These components are crucial for distributing signals in passive optical networks (PONs) and for creating redundancy in network architectures.
4. Optical Filters: Optical filters allow specific wavelengths of light to pass through while blocking others. They are used for wavelength division multiplexing (WDM) to combine and separate multiple optical signals on a single fiber.
5. Optical Couplers: Optical couplers are passive devices used to combine or split optical signals. They can distribute optical signals among multiple fibers or combine signals from different sources onto a single fiber.
6. Optical Isolators and Circulators: These components control the direction of optical signals within a fiber. Optical isolators allow light to travel in one direction only, while circulators can route light among multiple ports.
7. Optical Attenuators: Optical attenuators are used to reduce the power of optical signals. They are crucial for adjusting signal strength and avoiding overloading optical receivers.
8. Optical Switches: Optical switches enable the rerouting of optical signals to different paths or destinations without the need for conversion to electrical signals. They are used for network redundancy and protection.
9. Optical Connectors and Adapters: These components provide the physical interface for connecting optical fibers and transceivers. Common types include LC, SC, MTP/MPO, and ST connectors and various adapters to facilitate different fiber connections.
10. Optical Reflectors: Optical reflectors are used to redirect optical signals, often in optical testing and measurement applications.
11. Optical Fiber: While not a component per se, optical fibers are the core medium for transmitting optical signals. Various types of optical fibers with different characteristics are used depending on the specific application.
These optical components are the building blocks of modern optical networks, enabling the high-speed data transmission and connectivity that underpin telecommunications, data centers, and internet services. They continue to evolve and improve to meet the growing demands of data-intensive applications and the need for faster, more efficient, and reliable communication infrastructure.
Companies: Cisco, Coherent, Infinera, Fabrinet, NeoPhotonics, Mellanox (now part of Nvidia), Acacia, Ciena, Juniper, Applied Optoelectronics
Data Centers
Jim Chanos, a renowned hedge fund manager and founder of Kynikos Associates, has been known for his skeptical view of certain sectors and companies, as he has risen to prominence from his success as a short seller. One area where Chanos has recently expressed a bearish view is on data center Real Estate Investment Trusts (REITs). His skepticism centers around the belief that these companies may not generate a return on invested capital (ROIC) that exceeds their cost of capital.
Data center REITs are companies that own and operate data centers, which are facilities housing computer systems and networking equipment. They provide essential infrastructure for cloud computing, data storage, and internet connectivity. These REITs lease data center space to technology companies, enterprises, and cloud service providers.
Chanos argues that there is essentially three ways for companies to maintain their data:
On premise - servers owned by the company that an IT department manages
Third Party Colocation Services - you keep your servers at a third party location (this is the area that he believes is in secular decline, earning ROICs lower than their cost of capital.) These are companies like Digital Realty and Equinix.
Hyperscalers - you don’t own servers at all but rent them from the major cloud providers (Amazon, Google, Microsoft)
Chanos believes these companies are in secular decline, are earning ROICs lower than their cost of capital and are understating their maintenance capex requirements using somewhat misleading accounting that records some necessary capital expenditures (without which the data centers would lose competitiveness or in some cases cease to operate) as growth capex rather than as maintenance. He points out that Digital Realty is burning $230mm/month on a $30B market cap and is capitalized at roughly 9x leverage (net debt + preferred of ~$19B vs $2.2B of EBITDA.) He believes these companies are overvalued and are facing numerous headwinds, both competitive and financially as their growth slows and the cost of their debt rises as interest rates go up.
Chanos's Key Arguments
1. Intense Competition: Chanos believes that the data center industry is highly competitive, with many players vying for market share. This competition can result in pricing pressures and reduced profitability for individual data center REITs.
2. Capital Intensity: Data centers require significant capital investments to build and maintain. According to Chanos, the capital expenditures required to keep up with technological advancements and security demands may erode the ROIC.
3. Cyclicality: Data center REITs are not immune to economic cycles. During downturns, demand for data center space may soften, impacting occupancy rates and rental income.
4. Lease Structure: Data center leases typically have long durations, often ten years or more. Chanos argues that this long-term commitment can be a double-edged sword, as it may lock in rental rates that become less competitive over time.
5. Technological Risk: The fast-paced nature of technology can pose risks to data center operators. Outdated facilities or infrastructure can lead to obsolescence and reduced competitiveness.
6. ROIC vs. Cost of Capital: The core of Chanos's argument is that data center REITs may not generate an ROIC that exceeds their cost of capital over the long term. This means that the companies might struggle to create value for shareholders.
ROIC vs. Cost of Capital
To understand Chanos's viewpoint, it's essential to grasp the concept of ROIC relative to the cost of capital. This is of course obvious to many readers but just wanted to explain it briefly for anyone unfamiliar with the concept.
ROIC (Return on Invested Capital): ROIC measures how effectively a company generates profits from its invested capital. It's a critical metric for assessing a company's ability to create value for shareholders.
Cost of Capital: The cost of capital represents the return expected by a company's investors in exchange for providing capital. It includes the cost of debt and the cost of equity. If a company's ROIC consistently falls below its cost of capital, it may not be generating enough return to justify the capital invested in it.
Jim Chanos's skepticism about data center REITs revolves around his belief that the competitive and capital-intensive nature of the industry may prevent these companies from consistently earning an ROIC that exceeds their cost of capital. For instance, he calculates that it costs Digital Realty $11 to generate $1 of revenue and on a 50% EBITDA margin that means they need to invest $11 to earn $0.50 of gross cash flow. With inflated valuations, growth slowing or even the business shrinking, rising debt service costs and lower returns on investment, Chanos believes that colocation data centers are bad investments and is putting his money where his mouth is by being short these companies in size.
There is a Business Breakdowns podcast where he discusses the thesis in more detail.
This piece is already getting a bit long, but interesting data center developments that are worth mentioning (and perhaps exploring in a future piece) are CoreWeave’s efforts to build a special purpose GPU cloud for AI workloads (a great Odd Lots podcast with one of the founders here) and Nvidia’s DGX, a platform that combines their infrastructure, software and expertise and was built from the ground up to be the best choice for enterprise AI workloads (AI training as a service.)
Conclusion
Putting it all together
So we have cable companies, wireless network providers, some of these companies do both and also provide internet services, hyperscalers that provide cloud computing, network infrastructure companies and the component manufacturers that make products for them. Data center owners and operators and major internet companies (who are also the major hyperscalers). Network capacity and bandwidth are a major bottleneck for AI workloads. Nvidia and Google are spending massively to innovate in this space. How do we view this from an investment lens? Companies with large capex (sometimes in excess of their ROIC) and declining ARPUs (the telcos) should be losers while the companies that are best positioned to benefit from these tailwinds (the internet companies/hyperscalers, the networking companies, the component manufacturers (both at the equipment and semiconductor level.) However, it’s hardly as easy as looking at a high level trend and then selecting the winners from companies that have historically been the best investments. In future articles, we will dive into these industries and the sub sectors in much greater detail and do our best to construct thematic long/short baskets from inevitable the winners and losers of the theme we are exploring. However if I took a shot at a basket after writing this it might look something like
Thematic L/S Basket
Long
TMobile
Nvidia
Google
Microsoft
Amazon
Arista
Broadcom
Marvell
Comcast
American Tower
SBA Communications
Fabrinet
Meta
Netflix
Dell
HPE
Palo Alto Networks
Juniper
Cisco
Fortinet
Short
AT&TCharter
Verizon
Disney
Digital Realty
Equinix
Cogent
Altice
Lumen
Frontier
Orange
Vodafone
Nokia
Ericsson
BCE
Deutsche Telekom
I hope you enjoyed this piece and I will be working on more of these for the future. If you have any thoughts, please feel free to reach out to me on Twitter X @netcapgirl. If you liked this write up, please consider subscribing.
great job sophie I use a variant of this quote all the time when I talk about long run investing:
"For instance, if someone had told you in the 1990s the number of cellphones that would be in use in 2023, you might have bought Ericsson and Nokia."
I enjoy your scientific approach - reminds me of the Alchemy of markets by George Soros. Hope you have read Karl Popper; great books and mental models which Soros used quite a bit.