Our transformative people and products
Technical Systems & Infrastructure is a transformative space that attracts experts whose passion is to evolve our technology enabling Google to thrive in a rapidly changing industry. Here you’ll discover more about our talented people and their purposeful developments.
Insights from lead engineers...
Insights from lead engineers...
At Google, we have many products with users across the world. I can pick up my phone and talk to one of my TPU chips in a data center and use my chips to do a Google search. I can actually see improvements as new hardware is deployed in the field. There’s quite a bit of instant gratification. I can actually see improvements as a new hardware is deployed in the field. There’s quite a bit of instant gratification.Norm JouppiVP, Engineering Fellow
We are always pushing to improve the performance and efficiency. We are constantly striving to integrate new technology so that we can continue to scale the bandwidth performance of optical interconnects. When I started here at Google, the bandwidth was 10 Gb/sec. Fast-forward 16 years and it has increased to 1.60 Tb/sec. You’re seeing a substantial increase in bandwidth, and we still have room to go.Hong LiuGoogle Fellow
What excites me most about my job is the opportunity to make changes across the stack. Our approach to research and development is simply not possible in an academic setting (where you’d be running benchmarks against yourself and it would be completely theoretical).Paul TurnerPrincipal Software Engineer
No matter what part of the organization you’re in, you can meet people who are interested in solving the problem by moving the needle in a fundamental way (as opposed to putting in hacks). The enthusiasm of your colleagues and the positive environment of “anything is possible” has a multiplicative effect on us.Nandita DukkipatiPrincipal Software Engineer
Systems, Machine Learning and Artificial Intelligence
Google has been a leader in the creation of ML/AI technology and applications, from our state-of-the-art ML hardware (TPUs and TPU Pods), to our innovations impacting the advance of science (AlphaFold), to our work on Large Language Models (BERT, GLaM, LaMDA, PaLM, Bard), to ML programming support (XLA, TensorFlow). In TS&I we design and deploy both hardware and software to support AI applications that are impacting Google’s systems, its cloud customers, and the world. We are interested in both ML for systems (using ML to make our systems more efficient or performant) and systems for ML (building HW/SW systems that support ML-based applications).
Hyperscalar-focused hardware/computer architecture
Google data center and hardware infrastructure underlies the global-scale computing that powers Google services, and through Google Cloud, many other organizations and services around the world as well. Google advances in hardware and computer architecture have enabled many of our innovations. From new custom hardware accelerators for machine learning (TPU), video encoding and processing (VCU), security (openTitan), network offloading (SmartNICs/IPUs), and other emerging domains, to rethinking hardware design for modularity (new “multi-brained” servers) and optionality (open-source hardware), we are reinventing hardware.
Programming languages/compilers for demanding applications
Google has created new languages like Go (systems programming), XLS (accelerated hardware synthesis), and p4 (packet processing) to adapt to new paradigms, and has influenced the direction of C++ and other languages to help address the needs of data center development. Google's code spans C++, Java, Kotlin, Go, Python, TensorFlow (for ML/AI), and more. We are continually developing automation to help refactor and evolve programs over time so the codebase can get cleaner, healthier, and faster. Google leverages C++ for data center compute, so we invest heavily in developing new technologies for compiler optimization and runtime libraries like garbage collection, memory allocation, and other code for performance. The foundation of our performance efforts is a data-driven approach to support incremental improvement, driven by Google-Wide Profiling and similar internal tools to understand the software and hardware characteristics of the fleet at scale.
Global and data center network technology
Reliable high-performance networking is central to modern computing, from WANs connecting cell phones to massive cloud data stores, to the low-latency datacenter-internal interconnects that deliver high-speed access to storage and fine-grained distributed computing, to under-ocean cables that connect Google’s data centers around the globe. Because our distributed computing infrastructure is a key strategy for the company, Google has long focused on building network infrastructure to support its scale, availability, and performance needs, and to apply our expertise and infrastructure to solve similar problems for Cloud customers. This infrastructure requires constant redesign, as the demands on networking continue to increase exponentially.
Systems software to manage massive scale
Google has continuously invested in the development of highly performant, fault-tolerant, and efficient Cloud computing environments at scale. Google’s Technical Infrastructure Engineers have pioneered approaches such as containers (resource isolation), Borg (cluster-wide resource management), Kubernetes (serverless distributed computing), the Andromeda virtual network stack, the Anthos multi-cloud environment, global-scale consensus, user-level messaging and more. Our research and development helps define the systems-level abstractions that allow new technologies and specialized resources -- such as ML engines and TPUs -- to be introduced into a reliable global cloud infrastructure.
Data storage, management, and analytics
There are perhaps no larger focused-purpose distributed systems than today’s hyperscaler storage and data management systems, which are driven by enormous increases in data production and collection, and the need to process and learn from that data. At Google, we manage every end of this pipeline, from data storage in enormous farms of hard disks and SSDs, providing both cheap bulk storage and fast high throughput storage, to global database systems such as BigQuery and Spanner. The scale, heterogeneity, and complexity of our data systems can present challenges and opportunities for innovations across the board. We are focused on expanding Google technical stewardship in helping to provide safe, reliable, secure, and efficient data storage, management, and processing, while exploring new and disruptive technologies.
Started in 2021, SystemsResearch@Google (SRG) is a research team positioned in the heart of Google technical systems and infrastructure engineering organization. SRG’s mission is to shape the future of hyperscaler systems design for Google by inventing, incubating, and infusing new concepts, designs, and technologies into Google applications, systems, and data centers. The team’s position can allow integrated engagement with engineering and product teams, enabling joint exploration in concert with transformative workloads. The SRG team is also forging strong relationships with external research communities working on pressing systems-research problems.