Tuesday, August 19, 2014

sorting strings with BASH

Sorting strings with BASH. Example email addresses: Say I want to invite my some of my old class mates to an event. But I don't have a mailing list. I do ,however, have a huge list of text files from my classes which contain the all the authors' email addresses. This is what I will to extract the email addresses.

First we'll use grep to grab all of the strings that look like email addresses by issuing a command like this:

grep -hrio "\b[a-z0-9.-]\+@[a-z0-9.-]\+\.[a-z]\{2,4\}\+\b" * > file.txt

Then we'll want to sort them into some kind of order but more importantly we'll remove all of the non-unique email addresses from the list.

sort -i file.txt | uniq -u > newfile.txt

Then we will remove the .mil and .gov domain extensions for obvious reasons.

sed -i '/.gov/d' newfile.txt
sed -i '/.mil/d' newfile.txt

Get a line count to see how many email we have.

sed -n '$=' newfile.txt

No comments:

Post a Comment