Thursday 18 October 2018

Setup Active / Passive Failover Cluster on ASA 5515X

Firstly ensure both ASA's are identical i.e. same IOS version, hardware and license otherwise the below will fail.

For this tutorial we will use a single interface (m0/0 for management), 2 (aggregated) interfaces for the failover link (and stateful replication) and finally 4 interfaces for our data.

ASA1> conf t
hostname ASA1

interface m0/0
management-only
nameif management
security-level 0
ip add 10.0.18.98 255.255.255.0 standby 10.0.18.99
no shutdown
route management 10.0.18.0 255.255.255.0 10.0.18.1

Setup Users / SSH / AAA with:

enable password securepassword
crypto key generate rsa general-keys modulus 2048
username yourusername password yousecurepassword privilege 15
username yourusername attributes
service-type admin
aaa authentication ssh console LOCAL
aaa authentication http console LOCAL
ssh verson 2

Enable ICMP for inside networks:

icmp permit any inside

Enable management access with:

http server enable
http 10.0.18.0 255.255.255.0 management
ssh 10.0.18.0 255.255.255.0 management

Configure our data interfaces and their assosiated etherchannels:

ASA1) int po1
port-channel
vlan 1000
no shut

int gi0/0
channel-group 1 mode active
no shut

int gi0/1
channel-group 1 mode active
no shut

int gi0/2
channel-group 1 mode active
no shut

We'll be serving three client VLANs - so we'll setup the trunking:

int po1.100
description InsideNetwork
vlan 100
ip address 172.16.32.2 255.255.255.248 standby 172.16.32.3
nameif inside
security-level 100
no shut

int po1.101
description OutsidePrimary
vlan 101
ip address 123.123.123.123 255.255.255.240 standby 123.123.123.124
nameif outside
security-level 0
no shut

int po1.102
description OutsideBackup
vlan 102
ip address 192.168.10.1 255.255.255.0 standby 192.168.10.2
nameif dmz
security-level 0
no shut

and on our switch stack:

int po1
switchport mode trunk
switchport trunk native vlan 1000
switchport trunk allowed vlan 100,101,102
no shutdown

int range gi1/0/1-3
channel-protocol lacp
channel-group 1 mode active
spanning-tree portfast trunk # to help speed up convergence
spanning-tree bpduguard enable

int po2
switchport mode trunk
switchport trunk native vlan 1000
switchport trunk allowed vlan 100,101,102
no shutdown

int range gi2/0/1-3
channel-protocol lacp
channel-group 1 mode active

Note: The channel group mode has to be active as the ASA does not support non-dynamic etherchannel, PAgP etc.

We'll now configure the failover link - for this we'll add redundancy via an etherchannel again:

ASA1> int po2
no shut

int gi0/4
channel-group 2 mode active
no shut

int gi0/5
channel-group 2 mode active
no shut

and then on the switch:

int po3
description failover link
switchport mode access
switchport access vlan 300
description ASA-Master-Failover
no shutdown

int po4
description failover link
switchport mode access
switchport access vlan 300
description ASA-Master-Backup
no shutdown

int range gi1/0/23,gi2/0/23
channel-group 3 mode active
channel-protocol lacp
no shutdown

int range gi1/0/24,gi2/0/24
channel-group 4 mode active
channel-protocol lacp
no shutdown

And now set the failover interface (po2 in our case):

failover lan interface FAIL-OVER po2
failover interface ip FAIL-OVER 192.168.254.1 255.255.255.240 standby 192.168.254.2
failover key strongpassword
failover lan unit primary

We'll also want to ensure that our subinterfaces (outside, inside and the DMZ) are monitored for link failures:

monitor-interface outside
monitor-interface inside
monitor-interface DMZ

enable finally enable the failover feature with:

failover
failover link FAIL-OVER

and save:

wri mem

Now on the slave ASA:

Define our failover interface:

int po2
no shut

int gi0/4
channel-group 2 mode active
no shut

int gi0/5
channel-group 2 mode active
no shut

failover lan interface FAIL-OVER po2
failover interface ip FAIL-OVER 192.168.254.1 255.255.255.240 standby 192.168.254.2
failover key strongpassword
failover lan unit secondary
failover

And then to confirm (on either unit):

show failover

If you need to execute commands on the slave you can issue:

failover exec standby show int ip br

or alternatively the current master:

failover exec active show int ip br

Uploading and booting from ROMMON Mode on ASA 5505/5510/5515/5540

Firstly power down the ASA. We'll now need to get into ROMMON mode - hookup to the console  and make sure that you also have an ethernet cable / machine plugged into the management port. Proceed by powering on the ASA - you should see a message stating:

'Use BREAK or ESC to interrupt boot.'

Hit ESC - at this point you should be in rommon mode - from the prompt enter the following to configure the IP settings so we will be able to copy the image over the ethernet port (this can also be performed over serial but it can take a long time):

ADDRESS=192.168.10.100
SERVER=192.168.10.254
IMAGE=asa992-smp-k8.bin
PORT=Management0/0
RETRY=3

If the TFTP server is on another subnet add the following (otherwise leave blank):

GATEWAY=192.168.10.1

and finally run the following to inititiate the tftp copy:

tftp

Wednesday 17 October 2018

ASA Cluster Setup

Clustering's main advantage in relation to the ASA's is the boot in throughput. It's typically employed in data centres and larger enterprise networks. However it's important to note that this does come at a cost - this is because by clustering you limit the feature set.

Clustering is now supported on the 5500 series (5512 and above) from IOS 9.2+. However you might need to upgrade your license (for free) in order to 'unlock' the functionality.

For this topology we will be using spanned etherchannel mode. Spanned etherchannels allow the ASA cluster to present a single IP address (the master nodes IP) - however with the exception of the management interface that will operate as a induvidual interface on each ASA (this comes in very useful when troubleshooting.)

We'll firstly configure this on our ASA's:

ASA1> cluster interface-mode spanned
ASA2> cluster interface-mode spanned

Note: A reboot may be required at this aferwards.

When configuring our managed interface we will need to create an IP pool so that the cluster master node can allocate an management IP to each member:

ip local pool MGMT_POOL 10.0.18.97-10.0.18.99

and then configure our management intefaces on the master node:

Note: The IP pool we created earlier is used to allocate induvidual IP's to each management interface in the cluster - while the explicit IP defined on the interface configuration below is the main cluster IP (i.e. the shared one.)

ASA1> interface m0/0
management-only
nameif management-pri
security-level 0
ip add 10.0.18.100 255.255.255.0 cluster-pool MGMT_POOL
no shutdown

http server enable
http 10.0.18.0 255.255.255.0 management-pri
ssh 10.0.18.0 255.255.255.0 management-pri

Next we will configure the data links (i.e. all of the traffic you wish to serve) - to do this we will setup a spanned etherchannel. This translates to an etherchannel spanning over all of the ASA's (i.e. all ASA's are connected together with a single channel group. We'll use two physical ports on each ASA for this example:

ASA1) int po10
port-channel span-cluster
vlan 1000
no shut

int gi0/1
channel-group 10 mode active
no shut

int gi0/2
channel-group 10 mode active
no shut

We'll be serving three client VLANs - so we'll setup the trunking:

int po10.100
description InsideNetwork
vlan 100
nameif inside
security-level 100
no shut

int po10.101
description OutsidePrimary
vlan 101
nameif outside1
security-level 0
no shut

int po10.102
description OutsideBackup
vlan 102
nameif outside2
security-level 0
no shut

Note if you are connecting the etherchannel to a vPC (Cisco Nexus technology) or a VSS (Cisco Catalyst 6500/6800 technology) you'd need to ammend as follows:
ASA1) int gi0/1
channel-group 10 mode active vss-id 1
port-channel span-cluster vss-load-balance

int gi0/2
channel-group 10 mode active vss-id 2
port-channel span-cluster vss-load-balance

Note: We don't need to apply this configuration on the slave switch as the settings will automatically propogate when the CCL is established.

The backend switch stack was a 3650X - for completeness I have included the other side of the spanned etherchannels mentioned above:

SWITCH-STACK> do show run ...

interface Port-channel10
 switchport trunk native vlan 1000
 switchport trunk allowed vlan 100-102
 switchport mode trunk

interface GigabitEthernet1/0/1
 switchport trunk native vlan 1000
 switchport trunk allowed vlan 100-102
 switchport mode trunk
 switchport nonegotiate
 storm-control broadcast level 10.00
 storm-control action trap
 no cdp enable
 channel-protocol lacp
 channel-group 10 mode active

interface GigabitEthernet1/0/2
 switchport trunk native vlan 1000
 switchport trunk allowed vlan 100-102
 switchport mode trunk
 switchport nonegotiate
 storm-control broadcast level 10.00
 storm-control action trap
 no cdp enable
 channel-protocol lacp
 channel-group 10 mode active

interface GigabitEthernet2/0/1
 switchport trunk native vlan 1000
 switchport trunk allowed vlan 100-102
 switchport mode trunk
 switchport nonegotiate
 storm-control broadcast level 10.00
 storm-control action trap
 no cdp enable
 channel-protocol lacp
 channel-group 10 mode active

interface GigabitEthernet2/0/2
 switchport trunk native vlan 1000
 switchport trunk allowed vlan 100-102
 switchport mode trunk
 switchport nonegotiate
 storm-control broadcast level 10.00
 storm-control action trap
 no cdp enable
 channel-protocol lacp
 channel-group 10 mode active

We'll now configure the cluster control links - these are setup in a *device local* etherchannel (i.e. ASA1 --> SW1,SW2 over po1, and ASA2 --> SW1,SW2 over po2.)

As per Cisco's guidance we should try our best to ensure that the CCLs (Cluster Control Links) can handle the same throughput as the data links. By doing this we can ensure failover can happen quickly during congestion.

ASA1> int gi0/4
description Cluster Control Link
channel-group 11 mode active
no shut

int gi0/5
description Cluster Control Link
channel-group 11 mode active
no shut

int po11
description Cluster Control Link
no shut

Again for completeness I have included the backend switch configuration:

int range gi1/0/23,gi2/0/23
desc ASA Primary CCL
channel-protocol lacp
channel-group 11 mode active

int po11
desc ASA Primary CCL
switchport mode access
switchport access vlan 300
no shut

int range gi1/0/24,gi2/0/24
desc ASA Backup CCL
channel-protocol lacp
channel-group 12 mode active

int po12
desc ASA Backup CCL
switchport mode access
switchport access vlan 300
no shut

We're now in a position to enable the cluster - we should do this on our primary ASA firstly:

cluster group HLXCluster
local-unit ASA-Primary
cluster-interface po11 ip 192.168.120.1 255.255.255.252
priority 1
key str0ngp@55word!
enable

Note: The cluster member with the lowest 'priority' will become the master.

To confirm run: sh cluster info

and then the secondary:

cluster group HLXCluster
local-unit ASA-Secondary
cluster-interface po11 ip 192.168.120.2 255.255.255.252
priority 100
key str0ngp@55word!
enable

and finally to confirm run (again):

sh cluster info

Sources

Chapter: Configuring a Cluster of ASAs

Wednesday 10 October 2018

Quickly identify the character encoding of a file in the shell

Using the file command  as follows will allow you to identify what character encoding a specific file has. This came in handy when I was reading a file from Python as by default it treats the file as ASCII encoded.  

bash> file -i file.txt 
test.txt: text/plain; charset=utf-16le

Wednesday 19 September 2018

Using the trace option with the bash shell

Until recently I wasn't aware bash has inbuilt tracing capabilities - which can really help when attempting to troubleshoot a script that is breaking.

Simply add the '-x' switch - for example:

/bin/bash -x /path/to/script.sh


Batch conversion of cer to pem certificates with openssl and bash

While this example can be applied pretty generically - it came in useful when I was tasked with converting several dozen certificates:
#!/bin/bash
for i in *.cer;
  do
  echo Converting: $i...
  outfile=`echo $i | sed s/.cer/.pem/`
  openssl x509 -inform der -in $i -out $outfile
done

Changing / assigning contexts with SELinux (labelling)

I came accross an SELinux error the other day when I instructed rsyslog to write radius logs to '/var/log/radius'.

The message was as follows:

'SELinux is preventing /usr/sbin/rsyslogd from write access on the directory /var/log/radius.#012#012*****'

After inspecting the SELinux label:

ls -Z /var/log/radius

drwx------. radiusd radiusd system_u:object_r:unlabeled_t:s0 radacct
-rw-r-----. radiusd radiusd system_u:object_r:unlabeled_t:s0 radius.log
-rw-r-----. radiusd radiusd system_u:object_r:unlabeled_t:s0 radius.log-1234567.gz

It was clear that the typical 'var_log_t' context was absent and hence preventing rsyslog from writing logs.

The 'var_log_t' defines common logging directories / files.

In order to assign a context we can issue the following:

chcon system_u:object_r:var_log_t:s0 /var/log/radius && chcon system_u:object_r:var_log_t:s0 /var/log/radius/*

Warning: Using chcon will not make the change of context permanent - we need to use semanage to ensure changes remain in tact after system relabel or the restorecon command.

semanage fcontext -a -t var_log_t "/var/log/radius(/.*)?"

The last part of the command instructs all existing files (and newly created ones) to be of the 'var_log_t' context within the '/var/log/radius' directory.

Finally confirm our changes (using restorecon as well to ensure changes are permanent):

restorecon -R -v /var/log/radius

ls -Z

Thursday 13 September 2018

Firewall Port Requirements for DFSR

I decided to compile this list due to the lack of coherent on the internet - even Microsoft's own documentation listed ports that clearly had no purpose. While these ports are automatically opened up when installing the specific features on the server they commonly need to be added external firewalls as well.

DCOM TCP/135
SMB TCP/445
RPC: TCP/49152-65535 OR ideally set a static port (dfsrdiag staticRPC /port:<port-number>; net stop dfsr; net start dfsr)

If you require remote DFS management ensure that the following ports are enabled:

WMI and RPC: TCP/49152-65535

You will also need to ensure that ports requried for file sharing are present:

ICMP: Echo Request
SMB (as above): TCP/445
LLMNR (Optional - but rarely needed these days): UDP/5335
NETBIOS (Optional - but rarely needed these days): UDP/147, UDP/138, TCP/139

If you require remote file server management you will also need to enable the following ports:

DCOM (as above): TCP/135
SMB (as above): TCP/445
WMI: TCP/49152-65535 (Windows Vista and above)

Wednesday 5 September 2018

Generating an AWS CMK with external key material (YubiHSM)

AWS provides you with the ability to use your own key material (i.e. generate your own symmetric key) for use with its Key Management Service.

In this tutorial I will demonstrate the complete process from create the CMK (Customer Master Key) - to securing the a service such as EBS with it.

Note: A CMK can be generated via the AWS CLI optionally - but for this example we'll stick to the AWS console.

Firstly from the AWS Management Console go to IAM (Identity Access Management.) Proceed by clicking on 'Encryption Keys' in the lower part of the left hand navigational menu.

If this is the first time you have used the service you'll need to skip through the welcome wizard.

Proceed by selecting the appropriate region (as by default this does not correspond with the region you are currently using.)

Hit 'Create Key'. Provide an alias, key description and expand the 'Advanced Options' tab. Here you will be able to define the origin / source of the key material. By default this is generated by Amazon's KMS service - however we'll select 'External' as we wish our own HSM (YubiHSM) to do this for us.

Proceed by setting up tagging, key administrators (i.e. users or roles who can perform administrative functions like deletion of key through the AWS API) and key usage permissions (i.e. what users or services that can use the key for encryption / decryption - in this case EBS.

You'll finally be presented with a chance to download the wrapping key and import token (not that this expires after 24 hours.) Make sure the 'RSAES_OAEP_SHA_256' algorithm is selected as it's the most secure method currently and fully supported by YubiHSM. The wrapping key is used to secure the symmetric key we will be exporting from YubiHSM and the import token is simply authorises you to upload the wrapped key to IAM.

Note: A wrap key is simply a way of securing a private key - typically used when a key is mobile e.g. being exported to another system. If you regularly use Windows systems you will have likely come across PKCS12 which is used to wrap keys.

The next step is to import our wrap key into YubiHSM - this can be performed 1 of 2 ways - either import it directly from the terminal:

./yubihsm-shell -a put-asymmetric -A aes256-ccm-wrap -c export_wrapped,import_wrapped --delegated=asymmetric_sign_pkcs,asymmetric_decrypt_pkcs,export_under_wrap --in=wrappingKey_wxyz -i 0x150

We can confirm it's been imported with:

./yubihsm-shell 
connect
session open 1
created session 0
list objects 0

We'll generate our symmetric key with:

get random <session-id> <pseudo-bytes> <out-file>

Note: As per the documentation for every 'pseudo byte' you get two bytes of data - so if in the event we are generating a 256 bit key we need to generate 32 bytes (258 / 8.) So in this case we need to generate 16 pseudo bytes:

get random 0 16 key.bin   

The ls output confirms the file is equal to 32 bytes:

ls -l key.bin

-rw-rw-r-- 1 user user 32 Sep  5 12:12 key.bin

or if you are in a test environment (and the following command should only even be run in one - due to lack of true randomness) you can perform it on a Linux box with:

openssl rand -rand /dev/urandom <bytes>

e.g.

openssl rand -rand /dev/urandom 32> key.bin

Since urandom takes bytes and we need 256 bits we do 256 / 8 = 32 bytes.

and to wrap the key:

openssl pkeyutl -in key.bin -out key.bin.enc -inkey wrappingKey_wxyz -keyform DER -pubin -encrypt -pkeyopt rsa_padding_mode:oaep -pkeyopt rsa_oaep_md:sha256

Return to the IAM key wizard page and click on the 'I am ready to upload my exported key material' and hit Next. Specify the Key Material (key.bin.enc), the import token (importToken_1234567...) and whether the key expires or not. Finally hit 'Finish.'

Note: You can also perform this operation from the AWS CLI with:

aws kms –region eu-west-1 import-key-material --key-id key-alias123456789 --encrypted-key-material fileb://key.bin.enc --import-token fileb://importToken_1234567... --expiration-model KEY_MATERIAL_DOES_NOT_EXPIRE

We can now create a newly created encrypted EBS volume. From the AWS Management Console go to EC2 >> Elastic Block Store >> Volumes >> 'Create Volume' and ensure that the 'Encrypt this Volume' is ticked. Select the newly created CMK and hit 'Create Volume.'

The last step is to ensure we import our unencrypted key material (key.bin) is imported into our YubiHSM - this can be done with the 'put opaque' command:

put opaque 0 0 aws-cmk 1 export_wrapped,import_wrapped opaque key.bin

Note: This key should also be included as part of your backup policy in the event that the YubiHSM device is lost / stolen or damaged.

Sources

Friday 31 August 2018

Connection reset: Powershell OpenSSH on Windows Server 2012

While Microsoft's implementation of Powershell worked perfectly (as per the instructions) on Windows Server 2016 - you need to go few some additional steps in order to get it running on Server 2012 R2.

I encountered the following message when attempting to connect via my *nix box:

ssh testuser@testbox.com

Connection Reset.

It wasn't a firewall issue since I could retrieve the OpenSSH banner via telnet.

After running the server in debug mode:

sshd -ddd

Everything seemed to work - so it looked like it was a permissions of some kind - after a little digging I found the following script that checks host permissions - running this resolved the issue:

PowerShell -ExecutionPolicy Bypass -File .\FixHostFilePermissions.ps1

Note: This script is included in the same package as the OpenSSH installer.

Need to remove the inheritance before repair the rules.
Shall I remove the inheritace?
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "Y"):
Inheritance is removed from 'C:\ProgramData\ssh\sshd_config'.

'NT AUTHORITY\Authenticated Users' should not have access to 'C:\ProgramData\ssh\sshd_config'..
Shall I remove this access?
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "Y"): New-NetFirewallRule -Protocol TC
P -LocalPort 22 -Direction Inbound -Action Allow -DisplayName SSH
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "Y"): Y
'NT AUTHORITY\Authenticated Users' has no more access to 'C:\ProgramData\ssh\sshd_config'.
      Repaired permissions

  [*] C:\ProgramData\ssh\ssh_host_dsa_key
      looks good

  [*] C:\ProgramData\ssh\ssh_host_dsa_key.pub
      looks good

  [*] C:\ProgramData\ssh\ssh_host_ecdsa_key
      looks good

  [*] C:\ProgramData\ssh\ssh_host_ecdsa_key.pub
      looks good

  [*] C:\ProgramData\ssh\ssh_host_ed25519_key
      looks good

  [*] C:\ProgramData\ssh\ssh_host_ed25519_key.pub
      looks good

  [*] C:\ProgramData\ssh\ssh_host_rsa_key
      looks good

  [*] C:\ProgramData\ssh\ssh_host_rsa_key.pub
      looks good

  [*] C:\Users\svc_adreporting\.ssh\authorized_keys
      looks good

   Done.

I know on the *nix implementation with the 'StrictMode' option the OpenSSH server will not operate if permissions are set incrorectly and I wonder whether something similar had been switched on in the Windows implementation.

There is also a script called FixUserFilePermissions.ps1 to check bits like permissions of the users .ssh folder and files within. If you are still experiencing problems it might be worth running this as well to ensure your user permissions are correct.

Wednesday 29 August 2018

Mouting file systems contained within a RAW disk image

To do this we will firstly need to identify where the partition we are interested in starts. This can be obtained from fdisk or parted - for example:

parted /path/to/disk.img

u # to change unit to bytes
B

p # to print partition table

Number  Start  End           Size          File system  Flags
 1 ... ... ... ... ...
 2      4238229B     53687091199B  53687091200B  xfs

In this case we are interested in partition 2 - so we'd set the offset in the mount command as 4238229:

mount -o loop,offset=4238229 /path/to/disk.img /mount/part2

The loop option is a pseudo device that acts as a block based device.

If like me you didn't have the luxury of a partition table to work from you can identify the start sector of the partition with testdisk e.g.:

testdisk /path/to/disk.img

and then perform the conversion of sectors to bytes:

expr <sectors> \* <sector-size>

e.g.:

expr 123456 \* 512 = 63209472

mount -o loop,offset=63209472 /path/to/disk.img /mount/part2



Friday 24 August 2018

Recovering data from a software (md) RAID array

I was attempting to recover a array of discs from an (inherited) SAN that had failed. Unfortunately there were no backups available so I was on my own! The inner workings of the SAN was locked down - so I knew little about the data structure on the disk themselves - but I knew that the SAN run on an old Linux kernel at the very least.

The array consisted of x4 500GB drives in a RAID5 setup.

After plugging the drives into a server and booting up a live Debian system I firstly attempted to scan for the RAID devices:

sudo apt-get update && sudo apt-get install mdadm -y

mdadm --assemble --scan

This failed - stating

I proceeded by querying the discs for the SMART data with smartctl in case any of the discs had any failures:

sudo apt-get install smartmontools

smartctl -a /dev/sd[abcd]

Unfortunately the last disc failed SMART:

SMART overall-health self-assessment test result: FAILED!

Time was clearly against me.. I proceeded by querying the proc fs to retrieve data about the RAID devices:

cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4] [raid0] 
md0 : inactive sdc4[1](S) sdd4[3](S) sda4[2](S) sdb4[0](S)
      1949109760 blocks super 0.91
       
md101 : inactive sdb5[0](S) sdc5[3](S) sda5[2](S) sdd5[1](S)
      1092096 blocks super 0.91
       
md100 : inactive sda2[3](S) sdd2[2](S) sdb2[0](S) sdc2[1](S)
      2184448 blocks super 0.91
Here we can see the data drive (made up of sda4,sdb4,sdc4,sdd4). Also note the numbers wrapped around the square brackets - these numbers indicate the order of the discs in the array.

The output indicates the discs are 'inactive' / not initialized.

 We can also collect additional information about the RAID discs with:

mdadm -E /dev/sda
/dev/sda:
   MBR Magic : aa55
Partition[0] :        32083 sectors at           47 (type 83)
Partition[1] :      1092420 sectors at        32130 (type 83)
Partition[2] :      1092420 sectors at      1124550 (type 05)
Partition[3] :    974555127 sectors at      2216970 (type 83)

I was specifically interested in Partition 3 (xfs) - so we can do:

mdadm -E /dev/sda4

/dev/sda4:
          Magic : a92b4efc
        Version : 0.91.00
           UUID : 6210c5a6:a386fad4:4714843b:49d8ab79
  Creation Time : Mon Jun 18 03:38:13 2007
     Raid Level : raid5
  Used Dev Size : 487277440 (464.70 GiB 498.97 GB)
     Array Size : 1461832320 (1394.11 GiB 1496.92 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

  Reshape pos'n : 0
      New Level : raid0
     New Layout : left-asymmetric
  New Chunksize : 0

    Update Time : Wed Jan  2 01:52:19 2002
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : df64ea6b - correct
         Events : 90

         Layout : left-symmetric
     Chunk Size : 64K
...

This provides some interesting information like the RAID level, members, number of devices in the array (including active ones) and the reshape position.

After some reading I discovered you can force the assembly through with a bogus backup file - e.g.:

mdadm --assemble --verbose --invalid-backup --backup backup.txt --force /dev/md0 /dev/sd[adbc]4

After checking dmesg I noticed the following error message:

[60967.198812] md/raid:md0: not clean -- starting background reconstruction
[60967.198819] md/raid:md0: unsupported reshape required - aborting.
Since there was zero information about this error on the Internet I ended up looking through the source code found here:


The error message gets triggered when 'mddev->new_level != mddev->level'. mddev is a struct that holds information about a RAID device. So basically it's telling us that if the existing RAID level does not equal to the 'new' proposed level an error should be thrown. 

This prompted me to go back over the earlier 'mdadm -E' (examine) output again and low and behold I noticed that although the existing RAID level was set to RAID5 (as expected) - the 'New Level' was set to RAID0!

So clearly it was failing because the conversion of RAID 5 to RAID 0 was not possible. But more importantly I was concerned about why this was happening in the first place!

I ended up recreating the array with (note that this command will not delete the data on the array itself:

sudo mdadm --create /dev/md0 --metadata=0.91 --assume-clean --verbose --level=5 --raid-devices=4 /dev/sd[abcd]4 --chunk=64KB

Note: Ensure that the meta data, drives, raid level and chunk size are specified! You can get this information from the examine switch e.g. mdadm -E /dev/sda4.

Finally this command mounted the RAID device:

mdadm: layout defaults to left-symmetric
mdadm: /dev/sda4 appears to be part of a raid array:
       level=raid5 devices=4 ctime=Mon Jun 18 03:38:13 2007
mdadm: /dev/sdb4 appears to be part of a raid array:
       level=raid5 devices=4 ctime=Mon Jun 18 03:38:13 2007
mdadm: /dev/sdc4 appears to be part of a raid array:
       level=raid5 devices=4 ctime=Mon Jun 18 03:38:13 2007
mdadm: /dev/sdd4 appears to be part of a raid array:
       level=raid5 devices=4 ctime=Mon Jun 18 03:38:13 2007
mdadm: size set to 487277440K
mdadm: automatically enabling write-intent bitmap on large array
Continue creating array? (y/n) y
mdadm: array /dev/md0 started.

cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4] [raid0] 
md0 : active raid5 sdd4[3] sdc4[2] sdb4[1] sda4[0]
      1461832320 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/4 pages [0KB], 65536KB chunk
I then checked the partition table:

fdisk /dev/md0

However with no luck:

Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Error: /dev/md0: unrecognised disk label
Model: Linux Software RAID Array (md)                                     
Disk /dev/md0: 1497GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
As I was confident that there were partitions on this disk I used a tool called 'testdisk' to help me identify lost partitions:

apt-get install testdisk

As it's an interactive application I have described the process flow below: 

testdisk >> Create >> 'Disk /dev/md0' >> 'Intel / PC' >> 'Analyze' >> 'Quick Search' >> Select the partition and then hit 'Write'.

The above process identified that there was an XFS partition.

Disk /dev/md126 - 1496GB / 1394 GiB - CHS 22841130 32 4
Partition     Start     End     Size in sectors
Linux     4099     0      1     31     4     2922905600

I suspect this disk might have been part of an LVM setup (hence the missing partition table!)

Verify the partition with parted:

parted /dev/md0

GNU Parted 3.2
Using /dev/md0
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Model: Linux Software RAID Array (md)
Disk /dev/md0: 1497GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start  End     Size    Type     File system  Flags
 1      269MB  1497GB  1497GB  primary  xfs          boot
Check the filesystem for errors:

apt-get install xfsprogs

xfs_repair -n /dev/md0p1
Phase 1 - find and verify superblock...
xfs_repair: V1 inodes unsupported. Please try an older xfsprogs.

Darn. So I ended up downloading a really old version of a CentOS live DVD from here:


Firstly identify the RAID device (as this might have changed):

cat /proc/mdinfo

and we'll then attempt to repair the filesystem again (note: we are going to perform a read-only check firstly!)

xfs_repair -n /dev/md0p1

This time xfs_repair was failing to find the superblock - so I suspected that (maybe) the start / end sector was incorrect - mostly due to the fact the 'testdisk' wasn't sure about the amount of tracks per cylinder - which changing effected the start and end sectors). I ended up running another utility 'UFS Explorer RAID Recovery' - and it identified different start / end sector:
So I decided to manually re-create the partition table:

fdisk /dev/md0p1

# delete the partition
d
# create a new partition
mkpart
pri
1
start sector: 524672
end sector: +2922905600 (this is the sector size - NOT the actual end sector)
# write changes
w

and again we attempt to run xfs_repair:

xfs_repair -n /dev/md0p1

Phase 1: Find a verify superblock ... success!
Phase 2:
...

It looks much better now - however from the output I'd clearly lost some files.

Warning: At this point you should have a block level backup of all of the disks in your array (ideally you should do this before doing anything) - as from this point on you can really screw up / easily lose all of your data.

We'll now try a read/write repair on the filesystem:

xfs_repair /dev/md0p1

ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Well attempting to mount the file system yields:

mkdir -p /mount/recovery
mount -t auto /dev/md0p1 /mount/recovery

mount: Structure needs cleaning

So it looks like we are going to have to discard the logs this time:

xfs_repair -L /dev/md0p1

Although this completed successfully unfortunately the file structure was not preserved and pretty much everything ended up in a 'lost+found' directory. On the other hand I was only searching for one (gigantic) file - so locating it was not that hard fortunately!

Finally mount it with:

mount -t auto /dev/md0p1 /mount/recovery

Thursday 5 July 2018

Locking down remote Powershell users access with restrained endpoints

The aim of this tutorial is to ensure that a restrictive session configuration is applied for specific users when they login remotely to a server via powershell and limit them to specific commands (applying a default deny all approach.) In addition all commands will be executed by a service account since we do not want our users to process any administrative rights on the server / domain.

We'll apply this using a Powershell Session Configuration - this is a type of policy that is executed  when you login via Powershell / PSSession to a remote computer.

The default session configuration carries very few restrictions and allows the users to run all sorts of potentially dangerous commands - for example: Stop-Service and so on.

We will also make use of 'proxy commands' - these allow us implement (or expand) our our own functions. For example the following command:

Get-Service Spooler | Restart-Service

could be implemented as a custom function:

Restart-SpoolerService

This provide us with two benefits:

- We can write complex / chained commands more concisely

- But more importantly it provides granular control on what exactly can be executed. For example simply restricting the a normal cmdlet e.g. Get-User would allow a user to execute the following:

Get-User -Identity userA
Get-User -Identity userB

While we might want to allow access to userA's information - we don't necessarily want to provide access to userB's information - so instead the proxy command can encapsulate 'Get-User -Identity userA'.

i.e. The proxy command 'Get-UserA' == 'Get-User -Identity userA'.

We'll start by firstly creating a new session configuration:

Note: We can use the 'New-PSSessionConfiguration' cmdlet to generate a session configuration for us - however in this example we will be manually creating it - so firstly save a file named C:\scripts\restricted_session_config.ps1 with the following:

$RequiredCommands = @("Get-Command", "Get-FormatData", "Out-Default", "Select-Object", "out-file", "Measure-Object", "Exit-PSSession" )
$ExecutionContext.SessionState.Applications.Clear()
$ExecutionContext.SessionState.Scripts.Clear()
Get-Command -CommandType Cmdlet, alias, function | ?{$RequiredCommands -notcontains $_.Name} | %{$_.Visibility="Private"}
$ExecutionContext.SessionState.LanguageMode="RestrictedLanguage"

This will limit the commands to: Get-Command, Get-FormatData, Out-Default, Select-Object, out-file, Measure-Object and Exit-PSSession.

We'll now need to register our new session config with:

Register-PSSessionConfiguration -Name "RestrictedUser" -StartupScript C:\scripts\restricted_session_config.ps1

Note: If you need to perform administrative operations on the server we will need to apply delegated administration - this simply means we will have a dedicated service account that has the appropriate administrative rights. When users connect to the powershell session all operations will be executed under this user account - to do this we'd issue the following instead:

Register-PSSessionConfiguration -Name "RestrictedUser" -StartupScript C:\scripts\restricted_session_config.ps1 -RunAsCredential 'server01\serviceaccount' -Force

If we now remote into the above server and run 'Get-PSSessionConfiguration' we should now see it listed.

The next step is to ensure that our limited user has the appropriate permissions on the session configuration -

Set-PSSessionConfiguration "RestrictedUser" -ShowSecurityDescriptorUI

This opens up an GUI ACL window where we will add our limited user and ensure they ONLY have 'Execute' permissions!

We need to specify the session configuration (from the client side) when connecting otherwise it will attempt to apply the default session configuration (that is only available to local administrators by default and as a result will deny access to the user.):

Enter-PSSession -ConfigurationName "RestrictedUser" <server-name>

or alternatively we can set the local variable 'PSSessionConfigurationName':

$PSSessionConfigurationName = "RestrictedUser"

We can quickly verify the policy has been applied by issuing:

Get-Commands

The final step is to create our proxy commands - so we need to append the following to our 'restricted_session_config.ps1' file on the server:

Function Hello-Function{
  Write-Output "Hello World"
}

NOTE: Make sure you add the 'Hello Function' into the 'RequiredCommands' section within the session configuration script as well!

To test connect to the server:

Enter-PSSession <server-name> -ConfigurationName RestrictedUser

and finally test the function:

Hello-Function

Tuesday 22 May 2018

vSphere Replication 'Not Active' status after initial setup of replication (manually install VIB file)

In my experience this error is caused due to the VIB file not being installed on one or more hosts.

Following the below steps will help you get replication up and running again:

1. Firstly identify which ESXI host the VM resides on that you are attempting to replicate.

2. Download the VIB file for vSphere Replication:

wget https://<vsphere-replication-address>/vib/vr2c-firewall.vib

2. Enable SSH on the ESXI host, scp the file to it:

scp vr2c-firewall.vib root@<esxi-host>/tmp

3. Login to the ESXI host via SSH and install the vib file:

esxcli software vib install -v /tmp/vr2c-firewall.vib

4. Usually the replication should start working immediately - but if it doesn't try restarting the vSphere Replication Appliance.

Wednesday 25 April 2018

Changing the baud rate on a Cisco 3650

Unfortunately the only way to do this is from ROMMON mode - so in order to easily access it we can instruct the switch to automatically enter it on the next reload:

conf t
boot enable-break

or alternatively power of the switch, hold down the 'Mode' button, turn the switch on (while still holding the 'Mode' button) for around 15 seconds.

Then set the baud rate appropriately:

set BAUD 9600

boot the IOS image:

flash_init
boot

and finally once it's booted up ensure that we revoke the 'enable-break' command:

conf t
no boot enable-break

Wednesday 4 April 2018

QoS for telephony on the 3650

Below is an example of QoS you can apply for telephony on the 3650. Ingress traffic on gi1/0/1 is marked accordingly and then queued according to the service-policy on gi1/0/24.

# Input QoS

ip access-list extended VOIP
 permit udp any range 16384 32767 any range 16384 32767

ip access-list extended MULTIMEDIA-CONFERENCING
 permit udp any any range 16384 32767

ip access-list extended CALL-SIGNALING
 ! SCCP
 10 permit tcp any any range 2000 2002
 ! SIP
 20 permit tcp any any range 5060 5061
 30 permit udp any any range 5060 5061
 ! H.323
 40 permit udp any any range 1718 1719
 permit tcp any any eq 1720
 ! MGCP
 50 permit tcp any any eq 2428
 60 permit tcp any eq 2428 any
 70 permit udp any any eq 2427
 80 permit udp any eq 2427 any

ip access-list extended TRANSACTIONAL-DATA
 10 permit tcp any any eq 443
 20 permit tcp any any eq 1521
 30 permit udp any any eq 1521
 40 permit tcp any any eq 1526
 50 permit udp any any eq 1526
 60 permit tcp any any eq 1575
 70 permit udp any any eq 1575
 80 permit tcp any any eq 1630
 90 permit udp any any eq 1630
 100 permit tcp any any eq 1527
 110 permit tcp any any eq 6200
 120 permit tcp any any eq 3389
 130 permit tcp any any eq 5985
 140 permit tcp any any eq 8080

ip access-list extended BULK-DATA
 10 permit tcp any any eq 22
 20 permit tcp any any eq 465
 30 permit tcp any any eq 143
 40 permit tcp any any eq 993
 50 permit tcp any any eq 995
 60 permit tcp any any eq 1914
 70 permit tcp any any eq ftp
 80 permit tcp any any eq ftp-data
 90 permit tcp any any eq smtp
 100 permit tcp any any eq pop3

ip access-list extended SCAVENGER
 10 permit tcp any any range 2300 2400
 20 permit udp any any range 2300 2400
 30 permit tcp any any range 6881 6999
 40 permit tcp any any range 28800 29100
 50 permit tcp any any eq 1214
 60 permit udp any any eq 1214
 70 permit tcp any any eq 3689
 80 permit udp any any eq 3689
 90 permit tcp any any eq 11999

class-map VOIP
 match access-group name VOIP
class-map MULTIMEDIA-CONFERENCING
 match access-group name MULTIMEDIA-CONFERENCING
class-map CALL-SIGNALING
 match access-group name CALL-SIGNALING
class-map TRANSACTIONAL-DATA
 match access-group name TRANSACTIONAL-DATA
class-map BULK-DATA
 match access-group name BULK-DATA
class-map SCAVENGER
 match access-group name SCAVENGER

policy-map MARKING-POLICY
 class VOIP
 set dscp ef
 class MULTIMEDIA-CONFERENCING
 set dscp af41
 class CALL-SIGNALING
 set dscp cs3
 class TRANSACTIONAL-DATA
 set dscp af21
 class BULK-DATA
 set dscp af11
 class SCAVENGER
 set dscp cs1
 class class-default
 set dscp default

int gi1/0/1
 service-policy input MARKING-POLICY

# Output QoS

class-map match-any VOICE-QUEUE
 match dscp ef
 match dscp cs5
 match dscp cs4
class-map match-all MULTIMEDIA-CONFERENCING-QUEUE
 match dscp af41 af42 af43
class-map match-all MULTIMEDIA-STREAMING-QUEUE
 match dscp af31 af32 af33
class-map match-any NETWORK-CONTROL-QUEUE
 match dscp cs7
 match dscp cs6
class-map match-any SIGNALING-QUEUE
 match dscp cs3
 match dscp cs2
class-map match-all TRANSACTIONAL-DATA-QUEUE
 match dscp af21 af22 af23
class-map match-all BULK-SCAVENGER-DATA-QUEUE
 match dscp af11 af12 af13 cs1

policy-map qos_pm_2P6Q3T_out
 class VOICE-QUEUE
 priority level 1
 police rate percent 10
 class MULTIMEDIA-CONFERENCING-QUEUE
 bandwidth remaining percent 10
 queue-buffers ratio 10
 queue-limit dscp af43 percent 80
 queue-limit dscp af42 percent 90
 queue-limit dscp af41 percent 100
class MULTIMEDIA-STREAMING-QUEUE
 bandwidth remaining percent 10
 queue-buffers ratio 10
 queue-limit dscp af33 percent 80
 queue-limit dscp af32 percent 90
 queue-limit dscp af31 percent 100
class NETWORK-CONTROL-QUEUE
 bandwidth remaining percent 7
 queue-buffers ratio 10
 class SIGNALING-QUEUE
 bandwidth remaining percent 3
 queue-buffers ratio 10
class TRANSACTIONAL-DATA-QUEUE
 bandwidth remaining percent 30
 queue-buffers ratio 10
 queue-limit dscp af23 percent 80
 queue-limit dscp af22 percent 90
 queue-limit dscp af21 percent 100
class BULK-SCAVENGER-DATA-QUEUE
 bandwidth remaining percent 5
 queue-buffers ratio 10
 queue-limit dscp values af13 cs1 percent 80
 queue-limit dscp values af12 percent 90
 queue-limit dscp values af11 percent 100
class class-default
 bandwidth remaining percent 25
 queue-buffers ratio 25

int gi1/0/24
 service-policy output qos_pm_2P6Q3T_out