Thinkings of Architecture Design

Thinkings of Architecture Design
by Lin Shiding (An Top Architect @ Baidu.com)
Online View(in Chinese): http://wenku.baidu.com/view/8f7a79dcad51f01dc281f127.html

Translated & Transcript by fcicq

Page 2
Begin with Examples:

  • Storage
  • Distributed
  • Service Architecture
  • Computation Models

Page 3
Storage(1):

  • Structure: File, Object, Table
  • Characteristics of Data
    • mutable or not
    • size
    • data layout
  • Access Pattern
    • Realtime Read(Query)/Write
    • Batched Write, Realtime Query
    • Stream Read
    • Scan / Range Query
  • "Realtimeness"
    • Realtimeness
    • Freshness
    • Consistency

Page 4
Storage(2):

  • Conflicts (fcicq: or trade-offs)
    • Latency / Throughput
    • Random / Sequential
    • Scale / Freshness (Realtimeness / Latency)
  • Model
    • B+ Tree (Realtime, Random)
    • Log-based (Batched, Sequential)
  • Solve the conflicts:
    • Weaken the requirements
    • Exploit the locality
    • Combine models

Page 5
Storage Model: B+ tree
(pic)

Page 6
Storage Model: Log-based structure
(pic)

Page 7
Storage (Model): Combined(/Hybrid) Model
(S: Sequential Read performance High, High storage capacity)
(R: Random Read performace High, Low storage capacity)

Page 8
Distributed

  • Goal
    • Scaling (capacity): scalability
    • Fault Tolerance: availability
  • Methods
    • Partition
    • Replication

(P = p^k => P = 1-(1-p)^k)

  • Point
    • Protocol Design
    • Debugging

Page 9
Distributed: Partition

  • Static hashing
    • cant modify/tune (after build)
  • Consistent hashing
    • K/n (fractional of total data) affected
  • Mapping
    • Split and Combine/Merge

Page 10
Distributed: Replication

  • Granularity
    • Machine
    • Record
    • Group

(Pic Translation:
粒度: Granularity, 开销: Cost,
并行度: Degree of parallelism, 可靠性: Reliability)

Page 11
Replication is not omnipotent (fcicq: cant solve every problem)

(Pic Translation:
故障率: Failure Rate)

Page 12
(Pic Translation:
时间: Time, 吞吐: Throughput, 输入: Load, 极限: Designed maximum throughput,
文艺模型: Well-designed / Optimal Model, 普通模型: Typical / poor-designed Model)

Page 13
Service Architecture

  • Goal
    • High throughput
    • Stable throughput/serving under extreme load
  • Model
    • Basic: threadpool + queue
    • Complex/Advanced: event-driven
  • Ensure the stablity
    • Reduce the granularity for resource allocation, active scheduling
    • Flow control
      • Load (data/metrics) feedback, Throttling (fcicq: on high load)
      • (Response) Latency deadline, multi-leveled queue (fcicq: looks like QoS)

Page 14
Computation

  • Data Intensive
    • MapReduce
    • Scan-Filter
  • Compute Intensive (CPU-Bound)
    • seti@home
  • Communication Intensive (Traditional HPC)
    • Machine Learning
    • Matrix related calculation

Page 15
Scan-Filter

  • Example: Calculate the Intersection of the 2 sets.
  • Input: list1 + list2, |list1|>>|list2|
  • Output: {}
  • MapReduce
    • Sort + Partition + Reduce
  • Scan-Filter (Model)

Page 16
How to make a Storage System?
How to make a High Performance Services?
How to make a Data Warehouse?

What is Architechure?
What does An Architect need?

Page 17-19
Three methods for An Architect

Understand the requirements

  • Tradeoff
    • Cant satisfy all the requirements
    • Dont have to treat all the requirements the same. (fcicq: the important ones have high priority)

(Yellow word on the right side:
Say No to unreasonable requirements! but still give end-to-end solutions)

  • Find the root requirements
    • Divide(breaking down a problem), Abstract, Dimensional(or scale) Reduction

(fcicq: "Dimensional Reduction" here means solve a simplified and/or scale-reduced problem?)

    • Define primitives and combine rules.
  • Understand the requirement changes as time goes by

Choose the methods

  • Estimate, Stimulation, Implement

(Yellow word:
Back-of-the-Envelope Calculation
Monte-Carlo Simulation
Discrete Event Simulation
Emulation)

  • Divide vs Iteration
  • Design Patterns

Maintaining appropriate pace

  • Planning the reachable path
  • Produce/Deliver periodically / on a regular basis

(Pic Translation:
迭代: Iteration)