Security+ Series Part 7: Risk Calculation

Welcome back to Security+ series.. In this post we are going to explore some techie and non-techie terms that will help us argue with our management to get some funding to get the security ball rolling.

We all know how important is to keep our stuff safe and available but sometimes that feeling itself  is not enough to convince our stakeholders to give us some money to make it happen.

That is why is important to provide some real numbers. And this is the purpose if this series. Putting risk and math together.

Risk Calculation

Risk describes the likelihood that a weakness in the system will be successfully exploited. For example Heartbleed or ShellShock are examples of vulnerabilities with very high risk. Simple because so many systems were vulnerable and the impact is high. It companies would definitely invest time and money to fix this issue asap otherwise they could loose a lot of reputation and money. If you speak to management, always quantify in numbers (meaning $$$), they will listen you more closely.

One example would be to justify a build of a disaster site in case of primary data center failure. The capital and operational expanse might be high, but in case of primary DC failure the service and therefore financial loss can be even higher, not mentioning loosing customers.

Alain Robert the real life spiderman has risk under his control

Likelihood 

Likelihood is the probability that a vulnerability will be exploited. For example a likelihood stealing data through SQL injection is much higher physically compromising the database server. Simple because everybody on the internet can play with your web app, but not many of those folks have guts to pull out a social engineering tactics to get to your premises physically.

There is a likelihood to not walk away alive from after this game

Impact

Do you remember on game called space impact which was epic on Nokia 3310? You were in space ship shooting down aliens and at the end of each level a big boss would appear.

When you destroy few of those small alien ships nothing fancy would happen, but when you default the boss, boy that was a huge impact for aliens.

The same is true with security, if your DB get comprised you are in big trouble, much bigger if someone would root your counter strike server, because they usually not hold sensitive data, only provide a presentation layer.

Space impact helps you understand the impact

SLE

Single Loss Expectancy is the cost associated with certain type of unwanted event. For example if your hard drive fail and you do not have a backup the cost may be higher than just the price for a new drive. The cost will include any lost data which you need to re-create at the best case, at worst case they are lost forever. The SLE is represented in cash.

ARO

Annualized Rate of Occurance, as the name implies it describes how often does the unwanted event occur. Does your HDD fail twice a year or once per 5 years. It is important to know because sometimes the risk cost may be lower than cost associated to eliminate risk. For example if all your important files are already backup and only system files could be lost, well in that case installing 2nd HDD and enabling RAID in every client machine would not be cost effective. ARO is usually describes as event per year, for example if event occurs twice a  year the ARO would be 2.

ALE

Annualized Loss Expectancy the number you get when you multiple Single Loss Expectancy and Annualized Rate of Occurrence. It gives you better overview how much will cost you to mitigate certain risk.

For example if you loose main power to your production gear twice a year and this event cost you $10000 in loss of revenue. The ALE would be $10000 * 2 = $20000.

In such case it would be wise to invest in UPS device or 2nd power feed.

MTTR

Mean Time To Repair describes how long it will take to restore the service to way it was. For example if you run out of toner, how long it will take to install new one? If it take just a few minutes because you have spares on site that is perfectly fine. But you have no spare you need to quote and order one, it may take a week to get the printer up an running. Execs would not be happy that they need to wait a week to print a financial report for a meeting.

MTBF

Everything fails, do not argue about that. Rather than question how often does it fail? Mean Time Between Failure can give you an estimation. Vendors usually list a value with their product. For example Cisco states that their Catalyst 2960G-48TC-L will likely to fail every 221 432 hours. Usually what fails most of the time is the power supply, therefore for critical devices aim for at least 2 PS units.

MTTF

Mean Time to Fail is very similar to MTBF, the difference here is that MTTF is relate to products that are not usually reparable. For example some micro compoments of a larger system, a capacitor for example it could have certain number of cycles that it can handle over its lifetime.

Quantitative vs. qualitative (ALE) 

The Annualised Loss Expectancy can be expressed by two ways. Quantitatite means you have the numbers in pounds, you can relate to amount of cost, to put it simple you have the data backing you up when you speak to shareholders.

The qualitative representation is your gut feeling which likely comes from your previous life experience. You just know that that hard drive will not last forever.

Vulnerabilities

Vulnerabilities are kinda favorite topic in security world. You can find them everywhere, and everybody talks about them. What is a vulnerability exactly. Well to put it simple it is a weakness in system. Weakness can be introduced by design itself, by implementation, by not following best practices. To put some meat into discussion, the ShellShock vulnerability in Bash was present almost 20 years in the code before it was released to public.

Offensive security runs a website called Exploit-db which collects list of newly discovered vulnerabilities.

One of the most advanced computer virus Stuxnet had capability to exploit 20 zero days weaknesses. Its mission was to slowly destroy centrifuges in factory. The term zero day refers to a vulnerability that has not been revealed to public.

Well done presentation about Stuxnet

Threat vector

Threat vector is a term that describes the attack surface. For example a web service exposes a different surface than a print server. Web application can be attacked by web based attacks such as SQL Injection, XSS, or vulnerability in daemon. More services – bigger attack vector.

For example a router with lock down SSH and minimal services running has smaller attack surface than an Internet facing web server.

Probability 

Probability describes how likely would be the vulnerability exploited. As I mentioned a SQL Injection would be much more likely to occur than social engineering at your corporate premises.

Risk-avoidance, transference, acceptance, mitigation

Sometimes introduction to new services could bring so high risk that company can decided to not implement the service. This is typical for new software releases, often companies wait a months after initial release just to avoid bugs and vulnerabilities in new code.

Other times, companies may accept risk associated with services. For example BYOD or Bring Your Own Device may open a new attack vectors for company, but the value of the service outstands this risk.

Mitigation refers how we reduce risks. Following the best practices, regularly patching and revising system configuration, performing vulnerability scanning. All these activities help reduce the risk of being exploited.

Risks associated with Cloud Computing and Virtualization 

With new trends come new risks. Cloud computing can provide a number of great benefits but it is important to understand the risks as well. For example what if one customer of a multitenant cloud gets compromised, how well did the cloud provider isolated the contaminated environment so other customers are safe?

What if attacker finds a way to crack the hypervisor and gain access to all virtual machines running on top of it?

RTO 

The Recovery Time Objective describes how long it will take to restore a failed system back online. If your e-commerce generates a ton of money you obviously want to have it up and running in no time.

Ma’am restoring these backups will take ages.

RPO

Recovery Point Objective is usually related to storage. How often you do full backup for example every night? In such case you can only recover up to that point and you lost data that were written during day. In practices you usually backup on daily or hourly basis but you also keep track of transaction that happened during the day so you can restore to most recent point of time. Obviously shorter RPO will cost you more money.

And with that my friends we are closing this section on risk calculation. I hope that you learned something new today. In the next one, we will be exploring risk associated with connecting our infrastructure to third parties.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s