Although there are many ways to achieve this I chose for a simplistic approach using cp and sed.
We can use the sed substitute function to replace any bad characters - we have the following directory we wish to 'cleanse':
ls /tmp/test
drwxrwxr-x. 2 limited limited 120 Jul 13 13:45 .
drwxrwxrwt. 40 root root 1280 Jul 13 13:43 ..
-rw-rw-r--. 1 limited limited 0 Jul 13 13:45 'fran^k.txt'
-rw-rw-r--. 1 limited limited 0 Jul 13 13:44 @note.txt
-rw-rw-r--. 1 limited limited 0 Jul 13 13:44 'rubbi'\''sh.txt'
-rw-rw-r--. 1 limited limited 0 Jul 13 13:43 'test`.txt'
We can run a quick test to see what the results would look like just piping the result out to stdout:
#!/bin/bash
cd /tmp/test
FileList=*
for file in $FileList;
do (echo $file | sed s/[\'\`^@]/_/g );
done;
Note: The 'g' option instructs sed to substitute all matches on each line.
Or an even better approach (adapted from here):
#!/bin/bash
cd /tmp/test
FileList=*
for file in $FileList;
do (echo $file | sed s/[^a-zA-Z0-9._-]/_/g );
done;
The addition of the caret (^) usually means match at the beginning of the line in a normal regex - however in the context where the brace ([ ]) operators are used in inverse the operation - so anything that does not match the specified is replaced with the underscore character.
If we are happy with the results we can get cp to copy the files into our 'sanitised directory':
#!/bin/bash
cd /tmp/test
FileList=*
OutputDirectory=/tmp/output/
for file in $FileList;
do cp $file $OutputDirectory$(printf $file | sed s/[^a-zA-Z0-9._-]/_/g);
done;
There are some limitations to this however - for example the above script will not work with sub directories properly - so in order to cater for this we need to make a few changes:
#!/bin/bash
if [ $# -eq 0 ]
then
echo "Usage: stripbadchars.sh <source-directory> <output-directory>"
exit
fi
then
echo "Usage: stripbadchars.sh <source-directory> <output-directory>"
exit
fi
FileList=`find $1 | tail -n +2` # we need to exclude the first line (as it's a directory path)
OutputDirectory=$2
for file in $FileList
do BASENAME=$(basename $file)
BASEPATH=$(dirname $file)
SANITISEDFNAME=`echo $BASENAME | sed s/[^a-zA-Z0-9._-]/_/g`
# cp won't create the directory structure for us - so we need to do it ourself
mkdir -p $OutputDirectory/$BASEPATH
echo "Writing file: $OutputDirectory$BASEPATH/$SANITISEDFNAME"
cp -R $file $OutputDirectory$BASEPATH/$SANITISEDFNAME
done
Note: Simple bash variables will not list all files recursively - so instead we can use the 'find' command to do this for us.
vi stripbadchars.sh
chmod 700 stripbadchars.sh
and execute with:
./stripbadchars.sh /tmp/test /tmp/output
Note: Simple bash variables will not list all files recursively - so instead we can use the 'find' command to do this for us.
vi stripbadchars.sh
chmod 700 stripbadchars.sh
and execute with:
./stripbadchars.sh /tmp/test /tmp/output
0 comments:
Post a Comment