Performance and Capacity Engineering - Facebook
Experience with Cloud Computing and Capacity
Data Lake Connector that democratizes data to all engineers.
General Infrastructure Engineering
N/A
##
Checked out Bill Jia’s talk at “Performance @Scale 2018”.
“The highest throughput, with acceptable latency, in the smallest footprint” – Performance and Capacity Engineering @ Facebook
“Without data, and with analysis of data, you’re not working on performance. You’re working on something else” – Bill Jia, Facebook
Consider: architecture, code base, infrastructure, hardware
Looked at the Data Center Performance adn Capacity Engineer job post
- Provide deep visibility into power, performance and health.
- Help optimize capacity usage (run simulations to determine utilization parameters)
- Identify bottle necks
- Develop simulation models and tools to monitor data center capacity performance and utilization. Write monitoring, reporting, data-mining tools to do performance and capacity-related tests and analysis.
- MySQL, Hadoop.
Read Bill Jia’s Publication, Machine Learning at Facebook: Understanding Inference at the Edge.
- “Machine Learning is used by most Facebook services”
- Ranking posts for News Feed, content understanding, AR/VR, speech recognition.
- RNNs, decision trees, logistic regression
- Desire to bring that to edge
- Optimizations include: model architecture search, weight compression, quantization, algorithmic complexity reduction, and microarchitecture.
- Optimizations enable edge inference to run on mobil CPUs. Only a small fraction of inference currently run on mobile GPU’s.
- Use of PyTorch and Caffe2
- Two internal packages NNPACK and QNNPACK
Learned about Flame Charts by Brendan Gregg
- Way of seeing what parts are consuming the most resources.
- Hierarchical; within that part what is consuming the most.
Some relevant notes:
- CloudComputing
- CapacityEngineering
- OpenShift
- Kubernetes
- Hadoop
- APIs