UNIX shell scripting : extract the value of XML-tagged data from a text file

UNIX Shell scripts are often used to perform interface data file manipulation tasks (basic validation, re-naming, archiving, etc). This is fine when the attributes being manipulated are, say, in the file name. But what if the script needs to grab a value from inside the data file? And maybe the file is in XML format.

For example , the XML data file contains the following string:

<FileSequenceNum >0000015</FileSequenceNum>

…and the value of FileSequenceNum is required by the shell script. There must be loads of ways to do this, but here are three examples:

TAG=FileSequenceNum
FILE=testfile.txt

VALUE=`sed -ne "/$TAG/s/[^0-9]*([0-9]*)..*/1/p" $FILE`

Or

VALUE=`grep -w $ TAG $FILE | awk -F'>' '{ print $2 }' | awk -F'<' '{print $1}'`

Or

TAG1="FileSequenceNum "
TAG2="FileSequenceNum"

grep -w $TAG1 dt_test.xml | sed -e 's/^[ t]*//' | sed "s/<$TAG1>(.*)</$TAG2.*/1/"

echo "Value is: $VALUE"

In the last example, two $TAG variables are used because the data file contains a trailing space in the opening XML tag name, but no space in the closing tag name:

<FileSequenceNum >0000015<FileSequenceNum>
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s