In a previous blog, we explained how to Install IIS Dynamic IP Restrictions in an Azure Web Role. In the present article, we'll provide guidelines to collect data and analyze it to be able to detect potential DOS attacks. We'll also provide tips to protect against such attack. While the article focuses on web applications hosted in Azure Web Role (PAAS), most of the article content is also applicable to IIS hosted on premise or on IAAS VMs.
I – Archive your logs
Without any history of IIS logs, there is no way to know if your web site has been attacked or hacked and when a potential threat started. Unfortunately, many customers are not keeping any history of their logs which is a real issue when the application is hosted as an Azure Web Role (PAAS) because PAAS VMs are "stateless " and can be reimaged/deleted on operations like scaling, new deployment …etc…
A comprehensive list of Azure logs is described in the following documents:
- Windows Azure PaaS Compute Diagnostics Data (see "Diagnostic Data Locations" section)
- Microsoft Azure Security and Audit Log Management (whitepaper)
To keep logs history, Windows Azure platform provides everything needed with Windows Azure Diagnostics (WAD). All you have to do is simply to turn the feature on by Configuring Windows Azure Diagnostics and you'll get your IIS logs automatically replicated to a central location in blob storage. One caveat is that bad configuration of WAD can prevent log replication and log scavenging/cleanup which in worst case may cause IIS logging to stop (see IIS Logs stops writing in cloud service). You also need to consider that keeping history of logs in Azure storage can affect you Azure bill and one "trick" is to Zip your IIS log files before transferring with Windows Azure Diagnostics. For on premise IIS, there are many resources describing how to archive IIS logs and you may be interested in this script: Compress and Remove Log Files (IIS and others).
In some cases related to Azure Web Role, there are situations where you need to immediately gather all logs manually. This is true if you've not setup WAD or if you can't wait for the next log replication. In this situation, you can manually gather all logs with minimal effort using the procedure described in Windows Azure PaaS Compute Diagnostics Data (see "Gathering The Log Files For Offline Analysis and Preservation"). The main limitation of this manual procedure is that you need to have RDP access to all VM instances.
Now that you have your logs handy, let's see how to analyze them.
II – Analyse your logs
LOGPARSER is the best tool to analyze all kinds of logs. If you don't like command line prompt, you can use LOGPARSER Studio (LPS) and read the following cool blog from my colleague Sylvain: How to analyse IIS logs using LogParser / LogParser Studio. In this section, we'll provide very simple LOGPARSER queries on IIS and HTTPERR logs to spot potential DOS attacks.
Before running any log parser query, you may have a quick look at the log files size and see if it is stable day after day or if you can spot unexpected "spikes". Typically, a DOS attack that is trying to "flood" a web application may translate itself into significant increase in HTTPERR and IIS logs. To check for logs size, you can use Explorer but you can also LPS/LOGPARSER as it provides a file system provider (FSLOG). In LPS, you can use the built in queries "FS / IIS Log File Sizes" to query on log file sizes:
SELECT Path, Size, LastWriteTime FROM '[LOGFILEPATH]' ORDER BY Size DESC
This first step can help to filter out "normal" logs and only keep "suspicious" logs. The next step is to start logs analysis. When it comes to IIS/Web Role analysis, there are 2 main log types to use:
- HTTPERR logs (default location: c:\system32\logfiles\httperr, location on web role : D:\WIndows\System32\LogFiles\HTTPERR)
- IIS logs (default location: C:\inetpub\logs\LogFiles, location on web role: C:\Resources\Directory\{DeploymentID}.{Rolename}.DiagnosticStore\LogFiles\Web)
II.1 Analyzing HTTPERR log
HTTPERR logs are generally small and this is expected (see Error logging in HTTP APIs for details). Common errors are HTTP 400 (bad request), Timer_MinBytesPerSecond and Timer_ConnectionIdle. Timer_ConnectionIdle is not really an error as it simply indicate that inactive client was disconnected after the HTTP keep alive timeout was reached (see Http.sys's HTTPERR and Timer_ConnectionIdle). Note that the default HTTP Keepalive timeout in IIS is 120 seconds and a browser like Internet Explorer uses a HTTP keep alive timeout value of 60 seconds. In this scenario, IE always disconnects first and this shouldn't cause any Timer_ConnectionIdle error in HTTPERR. Having a very high number of Timer_ConnectionIdle may indicate a DOS/DDOS attack where an attacker tries to consume all available connections but it can also be a non IE client or a proxy that is using a high keep alive timeout (> 120s). Also, seeing a lot of Timer_MinBytesPerSecond errors may indicate that malicious client(s) trying to waste connections by sending "slow requests" but it can also be that some clients are simply getting poor/slow network connections…
For logs analysis, I generally use a WHAT/WHO/WHEN approach:
WHAT |
SELECT s-reason, Count(*) as Errors FROM '[LOGFILEPATH]' GROUP BY s-reason ORDER BY Errors DESC |
WHO |
SELECT c-ip, Count(*) as Errors FROM '[LOGFILEPATH]' GROUP BY c-ip ORDER BY Errors DESC |
WHEN |
SELECT QUANTIZE(TO_TIMESTAMP(date, time), 3600) AS Hour, COUNT(*) AS Total FROM '[LOGFILEPATH]' GROUP BY Hour ORDER BY Hour |
This allows to quickly see WHAT are the top errors, WHO triggered them (client IPs) and WHEN the errors occurred. Then, depending on the results, some further filtering may be needed. For example, if the number of Timer_ConnectionIdle errors is very high, you can check the client IPs involved for this specific error:
SELECT c-ip, Count(*) as Errors FROM '[LOGFILEPATH]' WHERE s-reason LIKE '%Timer_ConnectionIdle%' GROUP BY c-ip ORDER BY Errors DESC
Also, we can do some filtering on a suspicious IP trying to check when suspicious access occurred :
SELECT QUANTIZE(TO_TIMESTAMP(date, time), 3600) AS Hour, COUNT(*) AS Total FROM '[LOGFILEPATH]' WHERE c-ip='x.x.x.x' GROUP BY Hour ORDER BY Hour
If the above queries are pointing to a suspicious IP, we can then check the client IP using a reverse DNS tools (http://whois.domaintools.com/).
II.2 Analyzing IIS logs
For the IIS logs, I use the same WHAT/WHO/WHEN approach as above:
WHAT |
SELECT cs-uri-stem, Count(*) AS Hits FROM '[LOGFILEPATH]' GROUP BY cs-uri-stem ORDER BY Hits DESC |
WHO |
SELECT c-ip, count(*) as Hits FROM '[LOGFILEPATH]' GROUP BY c-ip ORDER BY Hits DESC |
WHEN |
SELECT QUANTIZE(TO_TIMESTAMP(date, time), 3600) AS Hour, COUNT(*) AS Total FROM '[LOGFILEPATH]' GROUP BY Hour ORDER BY Hour |
The above queries are voluntary simples. Depending on results, we will need to "polish" them by adding filtering/grouping…etc There are already a lot of excellent articles covering this topic so I won't reinvent the wheel:
- Inside Microsoft.com - Analyzing Denial of Service Attacks
- Log Parser Example Queries
- Recommended LogParser queries for IIS monitoring?
III - What can I do to harden/protect my web application from DOS attacks ?
Security guidelines for IIS/Azure Web Role are described in the Windows Azure Network Security Whitepaper (see section "Security Management and Threat Defense" and "Guidelines for Securing Platform as a Service"). While Azure implements sophisticated DOS/DDOS defense for large scale DOS attacks against Azure DC or DOS attacks initiated from the DC itself, the document clearly mentions that "it is still possible for tenant applications to be targeted individually". This basically means that web application in Azure should use similar means as on premise application to protect themselves against attackers and pragmatically, this means you have to put in place a couple of actions:
- collect your logs using WAD and analyze them (section 1 & 2)
- consider Install IIS Dynamic IP Restrictions in an Azure Web Role
- consider using URL rewrite blocking rule to block unsolicited requests detected by log analysis
- implement authentication in your web application and limit anonymous access whenever possible
- use PAAS ACLs whenever possible so that you whitelist/blacklist IP ranges allowed/denied to access the application
- the following powershell script can be used to automate HTTPERR logs analysis and setup ACLs to blacklist frequently appearing IPs : Script to monitor and protect Azure VM against DOS
- regularly check windows logs for suspicious activity and malicious requests (Easily detect and block malicious HTTP requests targeting IIS/ASP.NET using "BLACKIPS")
- if you have specific security needs like, for example, packet inspection, consider using dedicated security software like Barracuda Firewall in front of your Web Role
- test your application. If you plan to do penetration testing in Azure, make sure to notify the operation team and follow the process described on http://www.windowsazure.com/en-us/support/trust-center/security/?fb=us-us
While this is unrelated to DOS attack, but it is also worth mentioning some basic security rules:
- make sure the GuestOS used is up to date (don't use a specific GuestOS version unless this is absolutely necessary) and understand how Guest OS are updated (see Role Instance Restarts Due to OS Upgrades)
- enable anti malware which is now released for IAAS and PAAS : Microsoft Antimalware Whitepaper
If you are interested in Azure Security, the following page is a very good central repository of resources: Security in Azure.
I hope you'll find the above information useful and remember that "forewarned is forearmed"…
Emmanuel