中文字幕一级黄色A级片|免费特级毛片。性欧美日本|偷拍亚洲欧美1级片|成人黄色中文小说网|A级片视频在线观看|老司机网址在线观看|免费一级无码激情黄所|欧美三级片区精品网站999|日韩av超碰日本青青草成人|一区二区亚洲AV婷婷

您當(dāng)前的位置:檢測資訊 > 科研開發(fā)

「可靠性」是「可用性」?

嘉峪檢測網(wǎng)        2021-07-20 21:56

相信點(diǎn)開這篇文章的讀者,一定或多或少接觸過“高可靠”“高可用”這些字眼,但是往往或語焉不詳,或羅列術(shù)語(MTBF、MTTR ...),那么我們到底應(yīng)該如何定量描述系統(tǒng)的可靠性和可用性指標(biāo)呢,這些看著很上流的術(shù)語到底意味著什么呢?

 

首先了解一下故障的定義:

 

故障定義

 

硬件故障(Hardware failure)

 

工業(yè)界通常使用“浴盆曲線”來描述硬件故障,具體如下圖所示。具體來說,硬件的生命周期一般被劃分為三個(gè)時(shí)期:

 

1)  The first part is a decreasing failure rate, known as early failures

2)  The second part is a constant failure rate, known as random failures

3)   The third part is an increasing failure rate, known as wear-out failures

 

「可靠性」是「可用性」?

圖1. 浴盆曲線(Bath tubcurve)

 

軟件故障(Software failure)

 

軟件故障可以通過每千行代碼的缺陷數(shù)(Defects/KLOC)進(jìn)行測量,稱為缺陷密度(Defect Density):

Defect Density= Number of Defects / KLOC

 

影響缺陷密度的因素主要有如下幾點(diǎn):

1)軟件過程(代碼評審、單元測試等)

2)軟件復(fù)雜度

3)軟件規(guī)模

4)開發(fā)團(tuán)隊(duì)經(jīng)驗(yàn)

5)可復(fù)用代碼比例(久經(jīng)考驗(yàn)的代碼)

6)產(chǎn)品交付前的測試

 

衡量指標(biāo)

 

平均故障間隔時(shí)間(MTBF)

 

英文全稱:Mean Time Between Failure,顧名思義,是指相鄰兩次故障之間的平均工作時(shí)間,是衡量一個(gè)產(chǎn)品的可靠性指標(biāo)。

 

故障率(Failure Rate)

 

以下文字摘自wiki,避免翻譯失真:

Failure rate is the frequency with which an engineered system or component fails,expressed, for example, in failures per hour. It is often denoted by the Greekletter λ (lambda) and is important in reliability engineering.

 

The failure rate of a system usually depends on time, with the rate varying overthe life cycle of the system. For example, an automobile's failure rate in itsfifth year of service may be many times greater than its failure rate duringits first year of service. One does not expect to replace an exhaust pipe,overhaul the brakes, or have major transmission problems in a new vehicle.

 

In practice, the mean time between failures (MTBF, 1/λ) is often reported insteadof the failure rate. This is valid and useful if the failure rate may beassumed constant – often used for complex units / systems, electronics – and isa general agreement in some reliability standards (Military and Aerospace). Itdoes in this case only relate to the flat region of the bathtub curve, alsocalled the "useful life period". Because of this, it is incorrect to extrapolateMTBF to give an estimate of the service life time of a component, which willtypically be much less than suggested by the MTBF due to the much higher failurerates in the "end-of-life wear out" part of the" bathtubcurve".

 

為便于理解,舉個(gè)例子:比如正在運(yùn)行中的100只硬盤,1年之內(nèi)出了2次故障,則故障率為0.02次/年。

 

上文提到的關(guān)于MTBF和Failure Rate關(guān)系值得細(xì)細(xì)體會,在現(xiàn)實(shí)生活中,硬件廠商也的確更熱衷于在產(chǎn)品上標(biāo)注MTBF(個(gè)人猜測是因?yàn)镸TBF往往高達(dá)十萬小時(shí)甚至百萬小時(shí),容易吸引眼球)。Failure Rate伴隨著產(chǎn)品生命周期會產(chǎn)生變化,因此,只有在前述“浴盆曲線”的平坦底部(通俗點(diǎn)說就是產(chǎn)品的“青壯年時(shí)期”)才存在如下關(guān)系:

 

MTBF= 1/λ

 

平均修復(fù)時(shí)間(MTTR)

 

英文全稱:Mean Time To Repair,顧名思義,是描述產(chǎn)品由故障狀態(tài)轉(zhuǎn)為工作狀態(tài)時(shí)修理時(shí)間的平均值。在工程學(xué),MTTR是衡量產(chǎn)品維修性的值,在維護(hù)合約里很常見,并以之作為服務(wù)收費(fèi)的準(zhǔn)則。

 

「可靠性」是「可用性」?

圖2. 硬件MTTR估算

「可靠性」是「可用性」?

圖3. 軟件MTTR估算

 

可用性(Availability)

 

GB/T3187-97對可用性的定義:在要求的外部資源得到保證的前提下,產(chǎn)品在規(guī)定的條件下和規(guī)定的時(shí)刻或時(shí)間區(qū)間內(nèi)處于可執(zhí)行規(guī)定功能狀態(tài)的能力。它是產(chǎn)品可靠性、維修性和維修保障性的綜合反映。

 

「可靠性」是「可用性」?

 

關(guān)于Availability這個(gè)計(jì)算公式,很容易理解,這里不多做解釋。通常大家習(xí)慣用N個(gè)9來表征系統(tǒng)可用性,比如99.9%(3-ninesavailability),99.999%(5-ninesavailability)。

 

宕機(jī)時(shí)間(Downtime)

 

顧名思義,指機(jī)器出現(xiàn)故障的停機(jī)時(shí)間。這里之所以會提Downtime,是因?yàn)槭褂妹磕甑腻礄C(jī)時(shí)間來衡量系統(tǒng)可用性,更符合直覺,更容易理解。

 

「可靠性」是「可用性」?

圖4. Availability與Downtime對應(yīng)關(guān)系

 

延伸思考

 

MTBF不靠譜?

 

一般來說,服務(wù)器的主要部件MTBF,廠商標(biāo)稱值都在百萬小時(shí)以上。比如:主板、CPU、硬盤為100wh,內(nèi)存為400wh(4根內(nèi)存約為100wh),從而可以推算出服務(wù)器整體MTBF約25wh(約30年),年故障約3%,也就是說,100臺服務(wù)器每年總要壞那么幾臺。

 

上面的理論計(jì)算看著貌似也沒啥問題,感覺還挺靠譜。但如果換個(gè)角度想想,總覺得哪里不太對勁:MTBF約30年,難道說可以期望它服役30年?先看看**的工程師如何解釋:

 

It is common to see MTBF ratings between 300,000 to 1,200,000 hours for hard disk drivemechanisms, which might lead one to conclude that the specification promisesbetween 30 and 120 years of continuous operation. This is not the case! Thespecification is based on a large (statistically significant) number of drivesrunning continuously at a test site, with data extrapolated according tovarious known statistical models to yield the results.

 

Based on the observed error rate over a few weeks or months, the MTBF is estimatedand not representative of how long your individual drive, or any individualproduct, is likely to last. Nor isthe MTBF a warranty - it is representative ofthe relative reliability of a family of products. A higher MTBF merely suggestsa generally more reliable and robust family of mechanisms (depending upon theconsistency of the statistical models used). Historically, the field MTBF, whichincludes all returns regardless of cause, is typically 50-60% of projected MTBF.

 

看到這里,再聯(lián)系前文對于Failure Rate的闡述,我知道各位讀者有沒有摸清其中的門道。其實(shí)說白了很簡單,這些廠商真正測算的是產(chǎn)品在“青壯年”健康時(shí)期的Failure Rate,然后基于與MTBF的倒數(shù)關(guān)系,得出了動輒百萬小時(shí)的MTBF。而現(xiàn)實(shí)世界中,這些產(chǎn)品的Failure Rate在“中晚年”時(shí)期會快速上升,因此,這些MTBF根本無法反映產(chǎn)品的真實(shí)壽命。文中也提到,**也意識到MTBF存在弊端,因此改用AFR(AnnualizedFailure Rate),俗稱“年度不良率”。

 

其實(shí),早在2007年,Google和CMU同時(shí)在FAST07發(fā)表論文,詳細(xì)討論了硬盤故障的問題:

 

CMU《Diskfailures in the real world: What does an MTTF of 1,000,000 hours mean to you?》

 

Google《FailureTrends in a Large Disk Drive Population》

 

Google采集了公司超過10w塊消費(fèi)級HDD硬盤數(shù)據(jù)(SATA和PATA,5400轉(zhuǎn)和7200轉(zhuǎn),7家不同廠商,9種不同型號,容量從80G到400G不等),最終得出如下數(shù)據(jù):

 

Google found that disks had an annualized failure rate (AFR) of 3% for the first threemonths, dropping to 2% for the first year. In the second year the AFR climbed to8% and stayed in the 6% to 9% range for years 3-5.

 

分享到:

來源:Internet

相關(guān)新聞: