Tuesday 10 January 2017

Excluding strings based on two files with grep

Excluding strings based on two files with grep

I had a simple scenario were a notification needed to be sent to everyone within a company - although there was a distribution group encompassing every employee in the company there were specific distribution groups that needed to be excluded - given the sheer size of the distribution groups doing this manually was not an option.

To summerize we can get the data from the 'global distribution group' from the Exchange shell e.g.:

Get-DistributionGroupMember everyone@companya.com | Where-Object {$_.RecipientType -eq 'UserMailbox'} | Select-Name > allmembers.txt

and the same kind of thing for the other distribution groups we wish to exclude.

But basically we end up with two files - one of which holds all of the employee emails (companyemails.txt) and the exclusions (exclusions.txt)

Now on our linux shell - we will use grep (in reverse mode) to return all of our emails with the exclusions removed:

grep -vf exclusions.txt companyemails.txt

I got 'Binary file companyemails.txt matches' - this is apprently because grep has detected a 'NUL' character and as a result considers it a binary file.

We should also ensure that line Windows line endings are removed otherwise this will cause problems with grep:

dos2unix /tmp/companyemails.txt
dos2unix /tmp/exclusions.txt

and any trailing spaces or tabs:

sed -i 's/[[:blank:]]*$//' /tmp/companyemails.txt
sed -i 's/[[:blank:]]*$//' /tmp/exclusions.txt

Re-running the file with the -a switch (--text) resolved the problem:

grep -vaf exclusions.txt companyemails.txt

-v for reverse mode and -f for file mode.

If you are dealing with large numbers of emails it might be worth running wc as well to sanity check figures:

cat /tmp/companyemails.txt | wc -l

grep -vaf exclusions.txt companyemails.txt | wc -l

Save the output to file:

grep -vaf exclusions.txt companyemails.txt > final.txt

and finally to get all of the emails into a presentable form using 'tr':

cat final.txt | tr '\n' ';'

(This simply replaces the new line character with a semi-colon.)


Post a Comment