怎么招聘SRE工程师-看看Google是如何招人

看到左耳朵耗子的一篇文章程序员如何把控自己的职业, 里边提到了Google SRE的评分卡, 用于招聘时软件工程师对自己技能水平的评估, 总共分为11个等级:

  1. You are unfamiliar with the subject area.
  • 对相关的技术领域还不熟悉
  1. You can read/understand the most fundamental aspects of the subject area.
  • 能够读懂相关领域相关的基础知识
  1. Ability to implement small changes, understand basic principles and able to figure out additional details with minimal help.
  • 可以实现一些小的改动,清楚基本的原理,并能够在简单的指导下自己找到更多的细节
  1. Basic proficiency in a subject area without relying on help.
  • 基本精通一个技术领域, 完全不需要别人的帮助
  1. You are comfirtable with the subject area and all routine work on it: For software areas - ability to develop medium programs using all basic language features w/o book, awareness of more esoteric feature(with book).
    For systems areas - understanding of many fundamentals of networking and systems administration, ability to run a small network of system including recovery, debugging and nontrivial troubleshooting that relies on the knowledge of internals.
  • 对这个技术领域非常的熟悉和舒适,可以应对和完成所有的日常工作
    • 对于软件领域 – 有能力开发中等规模的程序,能够熟练和掌握并使用所有的语言特性,而不是需要翻书,并且能够找到所有的冷知识
    • 对于系统领域 – 掌握网络和系统管理的很多基础知识,并能够掌握一些内核知识以运维一个小型的网络系统,包括恢复、调试和能解决一些不常见的故障。
  1. An even lower degree of reliance on reference materials. Deeper skills in a field or specific technology in the subject area.
  • 对于该技术领域有非常底层的了解和深入的技能
  1. Ability to develop large programs and systems from scratch. Understanding of low level details and internals. Ability to design / deploy most large, distributed systems from scratch.
  • 能够从零开发大型程序和系统。理解底层和内部细节。能够从零设计和部署大型分布式系统
  1. You understand and make use of most lesser known language features, technologies, and associated internals. Ability to automate significant amounts of systems administration.
  • 理解并能利用高级技术,以及相关的内在原理,并可以从根本上自动化大量的系统管理和运维工作
  1. Deep understanding of corner cases, esoteric features, protocols and systems including “theory of operation”. Demonstrated ability to design, deploy and own very critical or large infrastructure, build accompanying automation.
  • 对于一些边角和晦涩的技术、协议和系统工作原理有很深入的理解和经验。能够设计,部署并负责非常关键以及规模很大的基础设施,并能够构建相应的自动化设施
  1. Could have written the book about the subject area but didn’t; works with standards committees on defining new standards and methodologies.
  • 能够在该技术领域出一本经典的书。并和标准委员会的人一起工作制定相关的技术标准和方法
  1. Wrote the book on the subject area(there actually has to be a book). Recognized industry expert in the field, might have invented it
  • 在该领域写过一本书,被业内尊为专家,并是该技术的发明人

SRE自评涉及到的技术领域主要有如下这些:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
– TCP/IP Networking (OSI stack, DNS etc)
– Unix/Linux internals
– Unix/Linux Systems administration
– Algorithms and Data Structures
– C/C++
– Python
– Java
– Perl
– Go
– Shell Scripting (sh, Bash, ksh, csh)
– SQL and/or Database Admin
– Scripting language of your choice (not already mentioned)
– People Management
– Project Management

看到这个自评表, 对照下自己的技能水平, 瞬间感觉路途慢慢了. 需要提升学习的东西太多了.

SRE(Site Reliability Engineering), 直译过来就是现场可靠性工程, 最开始由Google发起的运维大型分布式系统一整套流程与方法, 有点类似于云服务运维中的DevOps, 目前有很多大公司如Facebook/Netflix等都在使用.

说到SRE, 这里有一篇十分值得一看的文章, 里边大概描述了什么是SRE以及Google是如何招聘SRE工程师: 什么样的人才是Google需要的; 如何面试;如何做出招聘的决策, 文章讲了很多方法与流程, 最终的目的就是要招聘到符合Google要求的人才, 为Google的发展注入动力. Google始终秉承着”宁可不招人, 也不要招错人”, “新招聘的候选人需要比已有团队的员工能力要更突出”的要求, 在人才的储备上下了很多功夫. 整篇文章看下来, 不得不佩服顶级互联网公司在人才培养方面的用心用力, 也正是这种严格高标准的人才招聘才不断的推动Google公司的业务发展.

原文链接:Hiring Site Reliability Engineers

参考链接