Saturday, April 7, 2012

Monitoring Local Array Hard Drive Status on VMware ESXi

I have had many network admins comment to me that one problem with VMware is that they cannot see the status of an array that is local or directly attached to the ESXi server.  This is certainly a problem.  We need to know immediately when a drive in an array fails so that it can be replaced and the parity rebuilt.  As admins, one of our primary objectives is to maintain data protection from hard drive failure at all times.  But a drive failure can leave you completely vulnerable (depending on RAID type used and hot spare configuration). 
Server vendors have long since had solid integration with Windows.  Well, the good news is that VMware has addressed this problem and the solution has been around since ESXi 4.x.  In ESXi VMware introduced a hypervisor with a much smaller foot print.  To support the hardware integration they developed a new api’s using the Common Information Model or CIM.  Hardware vendors are responsible for making the CIM providers which are known as Host Extensions and can be deployed using the Host Update Manager or the esxcli utility.  HP offers the Offline Bundle available on the downloads page for the appropriate server platform and OS.  You will need to check your server vendor documentation for where to download the update.

The hard drive status will then be available through the Hardware Status tab of the VSphere client.  You cannot configure the array from here but it will reveal whether the array is degraded and you will be able to determine which drive is failed.


Alerts can then be generated through VCenter.  There is a pre-configured alert at the VCenter level that can be enabled called “Host storage status”.  There are several actions that can be configured when the alarm triggers such as traps or emails to team members or other monitoring systems.  The drive failures can then be acted on immediately greatly reducing the "window of vulnerability".
Note: Standalone ESXi servers don’t support generating Alerts so this would be one good reason to purchase one of the very cost effective Essentials bundles offered by VMware.

Once the CIM provider is installed other third party management systems can also be used to pull that data from the host.  Veeam Monitor is one example.  Review your network management system documentation.

Monday, July 11, 2011

iSCSI Deployment for Small Virtual Infrastructure Environments

Virtualization is no longer a bleeding edge technology; it has become the standard for deploying new infrastructure. In order to support many of the features and efficiencies of what virtualization promises, shared storage must be used. Storage can be virtualized as well and offers some additional features
in regards to optimization and availability. This paper is not intended to be a formal "Best Practices" document but rather an informal design guide of what I have found works best in general for deploying virtual infrastructure connected to shared storage using iSCSI.

What is iSCSI?

Shared storage can be connected using many different protocols and interfaces. I will not compare and contrast these options but rather will focus on using iSCSI on gigabit ip infrastructure. It is debatable whether iSCSI is the best performing option today but it is generally accepted that it is the most cost efficient to deploy and maintain.

iSCSI is a block level storage protocol using SCSI commands (the same used in directed attached storage) encapsulated in ip packets. It runs on regular ip network gear and does not require anything special although it is best to deploy it on gigabit infrastructure. It was initially developed in 1998 and standardized in 2004.

This makes it an excellent choice for smaller virtual deployments under 100 vm's. I personally have deployed infrastructure with 50 vm's with plenty of room for growth. Of course everything is relative and it depends on the virtualized workload and the I/O per second (iops) that can be supported by the san. Again, this is not a sizing document but is meant to provide some assurance that iSCSI is an extremely viable solution in such environments and should not be considered "bleeding edge" or "1.x" technology.

Bandwidth

The first thing we think about when considering iSCSI is bandwidth. Our experience in the server world makes us wonder that if a physical server has a gigabit interface with direct attached storage, how can a virtual host support 20 vm's (typical # for a single host) using a single iSCSI interface? The key to understanding this is to understand how much I/O your server really requires. Typical 30 iops per vm is good enough to use for budgeting. If you multiply out the 20 vm's X 30 iops you get 600 iops.

Now the trick is to turn that # into bandwidth. The formula is this….

IOPS * TransferSizeInBytes = BytesPerSec

Now assuming the TransferSizeinBytes (block size) is 8k, the above example would yield 4,800,000 which is roughly 4MB per second. Mathematically, gigabit links support 125MBps. So, as you can see in this example, a single gigabit interface is sufficient. The moral of this story is that the network is usually not the bottleneck at this scale.

Something to keep in mind here is that your virtual environment will most likely have multiple hosts. Therefore, I like the idea of having an aggregate of bandwidth on the SAN side to support potential spikes in I/O. For example, 4 virtual hosts with 1gb iSCSI interfaces should connect to a SAN that has multiple gb interfaces bonded together to support 2gb or more. This will depend on your SAN and how many interfaces it supports but it is very common for SAN's today to ship with at least 2 interfaces with the option of adding additional.

Keep in mind that switch configuration should be made to support this as well. You can use either Etherchannel or LACP (aka link aggregation) to set that up on your switches. It goes without saying that a good switch is recommended here. Personally, I think Cisco is the best choice here but there are other options that meet the requirements such as Procurve. I recommend a layer 3 switch but it is not required.

Network Optimizations

Although I mentioned above that the network was not the bottleneck, that is not to say that we should not take steps to optimize the network for storage. The next two items are generally considered to be best practices however, it always surprises me how many deployments miss these items.

Vlans

Storage traffic should be segmented on a separate vlan. This resolves 2 primary problems.

  1. Security – nobody has any business on the storage vlan except the storage devices and administrators. There is no positive benefit for users and storage to share the same vlan.
  2. Broadcast traffic is reduced to the devices on the storage vlan. iSCSI storage is an Ethernet technology and when an Ethernet device broadcasts all devices on that segment listen. Putting storage interfaces on a separate vlan mitigates this traffic overhead.

Jumbo Frames

Ethernet frames are 1500 bytes by default. The nature of TCP/IP involves transmit and reply packets. It has been found that frame size can be increased therefore reducing the amount of TCP/IP packets put on the wire. These packets are called Jumbo Frames and are 9000 bytes. Storage traffic can uniquely benefit from this. I have found that file transfer speeds have increased by as much as 50% using Jumbo Frames although I have not done specific testing to measure its effect on iSCSI traffic.

Of note here is the fact that all points in the network traffic flow need to support Jumbo Frames otherwise frame size will fall back to the lowest common denominator. In a virtual infrastructure, this would entail configuring your storage SAN, switches, and virtual hosts to support Jumbo Frames.

Also of note, in a VMWare infrastructure, you will need to enable Jumbo Frame support at the cli unless you are running the Enterprise Plus edition of VSphere. If you are running the Enterprise Plus edition then you can configure this using the VSphere Distributed Switch.

Impact of iSCSI HBA's

iSCSI requires some cpu overhead on the virtual host side. There are purpose built Host Bus Adapters (HBA's) for offloading the iSCSI work onto the card. But the cpu overhead is generally very low at about 3% and with today's processers this is not a problem. Additionally, VMWare support for iSCSI in 4.x has been re-written and improved yielding better performance than 3.5 or earlier. I have not seen this be a problem but there is nothing wrong with using an iSCSI HBA either.

SAN's, Arrays & Disks

Everyone reading this is familiar with RAID arrays and various expectations of performance using different RAID levels. iSCSI & SAN's do not change these laws. In general, a SAN does not instantly make a 4 disk RAID 5 array perform magically better than it did in DAS. In my testing using iometer, I have found the results to be consistent with what would be expected from the underlying array configuration. Different storage vendors may debate that but let's just keep it an even playing field for the moment and consider that the SAN serves up the LUN as is.

Generally, you will create a "datastore" that is comprised of many more disks spreading the I/O across a larger array and therefore delivering better performance than what you could deliver with multiple smaller arrays. This is another benefit of using shared storage and a SAN.

Therefore, you will want to use an array of as many physical disks as possible and you will want to use the type of disk that will deliver the proper amount of I/O to support your array. 15k-SAS delivers about 160 iops while a 7.2k-SATA is at about 80 iops. You will choose your RAID level based on the balance of redundancy vs performance. I usually use RAID 10 for SATA and RAID 10 or 50 for SAS depending on budget/requirements. Generally, I like RAID 10 overall due to best performance and lowest impact on rebuild times when drives fail.

Check your SAN documentation on how much flexibility you have on configuring your storage as some SAN's will only let you configure one aggregate of storage and RAID level. You can play with various array configurations using this online calculator to see the relation of spindles and RAID and their impact on IOPS.

Summary

Deploying iSCSI is deceptively easy. I have found that many people run into issues when not fully understanding all the pieces that come into play. The following items should be added to the iSCSI deployment checklist. Doing so will help you avoid many of the most common issues and enable you to get the most out of your iSCSI deployment.

  • Configure storage vlan
  • Enable jumbo frames on all switches with storage vlan, SAN's, VMware or other virtual hosts.
  • Implement link aggregation on SAN for better throughput and availability.
  • Understand the limits of your SAN both in capacity and iops.

Saturday, September 18, 2010

The Modern Firewall

The firewall landscape is fairly cluttered these days. To make things more confusing, most devices in this class offer a very similar feature set. The most common features people look for in a firewall are dynamic nat, static nat, firewall rules, site to site vpn's and vpn server functionality. Almost every firewall vendor offers these features leading most network admins to get the cheapest device thinking that they are all the same.

For some people, this strategy works. But I would argue that these people are most likely running networks that are either extremely small with very few applications, minimal bandwidth, and just generally use the firewall to access the internet. I won't address those networks here. For the rest of us running modern networks under 250 users the requirements for the internet edge device (firewall) have changed to include the following…

  • Clientless vpn's: Sometimes referred to as ssl vpn's, these vpn's allow users to connect from outside the network from anywhere securely to get to their applications.
  • Advanced Authentication: As networks have grown, it is no longer effective or secure, to have local user accounts on devices. Users must be authenticated centrally and the best way is to authenticate against Active Directory (assuming you are running an A.D. domain). I can't tell you how many times I've looked at network devices only to see user accounts that nobody knew who they were for.
  • Advanced application inspection: Applications have gotten more complex today. SIP witch is a protocol used for voice/video needs more than just standard nat translations for it to work. SIP puts network addresses inside the data portion of the packet, so when the packet passes through the firewall and nat is applied the application at the remote end of the connection still sees the private address thereby breaking the connection. Firewalls need to be able to see past the packet headers and translate those addresses as well.
  • Malware prevention: Everybody (anyone with any sense) runs anti-virus on their desktops but we have seen that this doesn't stop everything. We tend to blame the anti-virus products but it's not all their fault. Malware has become very complex today and can circumvent many detection mechanisms. Obviously we need an effective product that has a balance between usability and protection but I don't think that is the complete solution and I don't think deploying additional software on workstations is effective either as it makes management complex and depends on the workstation having all those applications running and updated for it to be secure. What happens if you need to shut the software down for some reason or if you deploy a new pc that hasn't been updated yet? You are exposed! Today's firewalls need to have the ability to do some malware detection/prevention at the edge.
  • Guest user access: Everyone has clients that need internet access. Most people just provide them with the key to their network with the expectation that they will just use the internet. You have also just handed them the keys to the front door. Networks have to be able to support secured logical partitions that allow guests to get to the internet without access to any corporate resources.
  • Mobility support…
    • Devices configured with dns names to access applications often break when they connect inside the firewall from the lan side. Almost every mobile phone today has the wifi capability and I have read articles that predict that the primary device for accessing the internet will be the mobile phone in a very short time. Users will want use their applications on their devices even when inside the network and they are going to expect even better reliability and performance.
    • Remote phones: With the increase of voip deployments (they will become the standard in the very near future), users will expect to be able to take their phone anywhere and connect it up and have it work just like they were in the office.
  • Availability: Internet access is more important for business and bandwidth is cheaper. Therefore it is now an affordable option for even small businesses to have redundant internet access. I see a lot of bonded T1's out there backed up by cable/dsl internet access. Firewalls need to be able to handle this automatically with no IT intervention.
  • Management & Monitoring: With increased emphasis placed on the internet, network admins will need to be able to easily manage and see what's going on in the firewall for troubleshooting, capacity planning, and security. Some networks will need to capture all logs/events at the internet edge for compliance reasons depending on the industry.
  • Throughput! We have seen steady increases in internet bandwidth connectivity both due to the need for B2B applications and the general lower cost of bandwidth. This trend will continue at least in the short term. Firewalls will need to still be able to perform running the same services at higher rates.

As I mentioned before, there are many vendors out there that offer products that support all or most of these requirements. I have heard that Juniper has a very solid product and Sonicwall has improved but I will focus on Cisco here. The PIX firewall appliance, the predecessor of the ASA, was a solid firewall and offered tremendous reliability and consistent performance. But I was never really impressed with it as having much in the way of "bells and whistles". The ASA 5500 series has changed that. This is an impressive appliance that is very well thought out from a hardware & software design perspective. The software configuration and management is consistent across all the models and is much closer to IOS making management that much easier for those Cisco admins out there. Combined with, in my opinion, one of the best support programs in the industry providing true 24x7 support this device is very compelling and can easily justify any upgrade.

On the hardware side, the 5505 comes with 8 switch ports, 2 of which are poe capable. These would be used for access points or phones. The device supports vlans and the switch ports can be configured as access ports or trunks. Guest vlans can be configured for wired ports and when used with Cisco access points, you can create a second wireless network for guest access on the same access point as your production network. The 5505 has an expansion slot that currently supports an Intrusion Prevention (IPS) module that can provide inline attack prevention. Cisco also offers a licensable feature (Botnet Filter) that does dns inspection against known malware sites preventing access to those sites through the firewall. Between these two features, you can provide a fair amount of security at the network edge. One or multiple ports can be configured for redundant WAN connections.

The 5510 supports 4 physical interfaces. In contrast to the 5505 they are not switch ports but are actual separate interfaces that can be assigned ip addresses. Great for datacenters, shared hosting environments, handling redundant wan connections or providing transparent firewall services. Optionally, 2 of the ports can be converted to gigabit through a license upgrade. There is also a separate management port if required. The device also has an SSM slot that supports various modules providing Content Security (web & av) or IPS services.

If you have a CallManager/CallManager Express voice platform, you can easily leverage the Phone Proxy license to allow phones to connect to the CallManager securely from the internet without requiring any special vpn's setup. This is huge both for users and admins. The same applies for the ssl vpn's. Mobility is further enhanced by translating dns requests for public names to private ip addresses making the switch from 3G to wifi simple without breaking applications. SIP trunks will work from any application, pbx or video conference system.

In summary, the ASA 5500 series addresses several evolving issues at the internet edge that we all face today. It is a solid platform with feature set that continues to grow and the business case can be made for upgrading now.

For further technical information and comparisons please visit Cisco site at http://www.cisco.com/go/asa

Tuesday, August 10, 2010

Game Changing Features with Cisco Voice

Cisco has been in the voice game for a while now and is now the #1 IP PBX vendor in the world. There is literally thousands of features rolled up in the Cisco voice solutions portfolio. Sifting through them can be quite a challenge. Below are the features that I consider to be "game changers". I define a game changer as being a feature that is not current requirement by the organization but would considerably change the way the organization works in positive way effecting users to be more productive and offer better service/responsiveness.

MeetMe Conferencing: This enables a user to setup his own conference bridge to host a voice conference. Participants then dial in directly from either inside or outside the firm. This can provide significant cost savings as most firms can easily spend $500-$1,000 in conferencing monthly.

Single Number Reach (aka Voice Connect): Rings the desk phone and cell phone simultaneously. The call can be picked up on either and passed back and forth seamlessly. This feature can be controlled with access lists and schedules to control what calls reach the mobile number. The original caller id is preserved so that you know who is calling even on your mobile phone.

Voicemail transcription (aka SpeechView): Typically users receive a copy of their voicemail via email to their smartphone, however, there is no way to tell if it is important or not unless they listen to the message. This feature sends a voicemail with the voicemail transcribed in the message with the voicemail as an attachment. This is especially valuable when the user is not in a position to listen to voicemail but might be able to read a message on their smartphone.

WebEx Connect: This is a small application that is licensed on a per user basis and is a SAAS offering from WebEx. It provides instant messaging, desktop sharing, click to call, and can even be used to start a WebEx Meeting. Advanced configuration also allows it to function as a softphone. This service can be federated meaning that it can be used to connect to other instant messaging clients. Auditing can be configured to address security concerns.

UC Phone Proxy: This feature enables users to connect a Cisco ip phone to any internet connection and register the phone securely with the CCM/CME. Calls are encrypted for security. Takes less than 5 minutes to provision on the administrative end and gives new meaning to "home is where you hang your hat". Very valuable for execs and mobile users or contractors that setup shop temporarily in foreign offices. (Cisco ASA required)

Personalized Auto Attendant: Voicemail boxes can be configured to support caller input so that a caller might have the option to reach an administrative assistant directly.

What’s the difference between CallManager and CallManager Express

Many people have asked me the difference between Cisco CallManager (CCM) and Cisco CallManager Express (CME). The answer is not that complicated but can be somewhat confusing and understandably so. Most software companies offer an "Express" version of their software where they have reduced the feature set or scalability of the product and therefore the price to make it more viable for smaller businesses. However, in this case, these two products are entirely different platforms, each with their own strengths.

CallManager is a server based product, now referred to by Cisco as an "appliance". It's basically an HP or IBM server with the a hardened Linux OS with the CallManager software bundled into it. Appliance is a good description of it since you basically just manage it through the web interface and even if you were to login to it at the console level you would only get the customized administrative shell with no access to the underlying OS.

CallManager Express on the other hand is IOS based meaning that it runs on a router, specifically any of the ISR 2800/3800 or the ISR-G2 2900/3900 series routers. They are really amazing boxes as they can literally bundle all network & voice services into one platform. For example, you could deploy a small office with Firewall/VPN/DHCP/PBX/Voicemail/Switching in one box and even throw in redundant WAN connections if you were so inclined. They support traditional PSTN connectivity such as PRI/FXO as well as IP trunks such as SIP or H323. Multiple offices can be interconnected providing 4 digit dialing between sites/offices or for least cost call routing.

Both of these products have been on a very aggressive development roadmap. CallManager is now the IP PBX of choice for enterprises or anyone with a Cisco network. It has the full complement of enterprise features with the scalability and redundancy you would expect. It is really impressive. Similarly, CME has come a long way since it's early days where it did very little other than provide basic voice services (dial tone). It can now provide a full set of features including Single Number Reach, advanced call routing, advanced conferencing, and supplementary services. Both systems use the same phones and provide similar user experience.

The primary difference is in scalability and redundancy. CallManager was designed for the large enterprise market. The system can be designed with multiple levels of redundancy for call routing and fail over such that there would be no single point of failure and could survive multiple failures if that level was required. CallManager are deployed in clusters and can support up to 30,000 phones per cluster. Clusters can be inter-connected to provide a seamless voice system. CME on the other hand was designed to be a single site solution and it excels at that. Some enterprises actually use a combination of the two solutions and inter-connect them with IP trunks. It is also very good for some businesses where network management is localized vs. centralized.

Feature wise the CallManager is king. Administrators have very granular control over call routing. Since we are in the IP world the CallManager software is developed much faster. Cisco makes some additional products (Presence Server/WebEx Connect/Meeting Place/PC based Attendant Console) that can integrate into the CallManager to provide enhanced services. And that goes for third-party software as well. This is a weakness of CME, there is very little additional software support for it.

One exception to this today is a relatively new product called CallManager Business Edition which is an appliance (server) but has limited scalability (500 phones and 20 sites). It is a single server solution and has a price point lower than a CallManager cluster and just a little bit more than CME.

So your choice of these options will be based on design and features required. The design should be put together by whoever is in the role of the network architect while the features should be addressed by the Systems Administrators. Both these individuals should work together and solicit input from managers and executives. The reality in small businesses is that these roles are often rolled up into one person. On the plus side, all of these options upgrade smoothly into the next level. For example a CME router would just turn into a voice gateway and a Business Edition appliance could upgrade the software to CallManager providing an unparalleled level of investment protection.

Cisco's recommendation for any customer is to consult with Cisco Partner and that is a sound strategy.