Wednesday, April 8, 2009

Delete lines from a file by line number with sed

Today i decided to make a new monitoring tool, and I needed to make a list of all permutations of 3 in a set of 22. The set happens to be hostnames of a private Tor network. Order is important, as forming 3-hop circuits through Tor is sequential, which is why i need permutations instead of combination's.

22 * 21 * 20 = 9240 permutations

Crap, I'm not really up on my combinatorial number theory, I guess I'll have to hack it up.

First I used an excel plugin to generate all the permutations.

But this ended up giving me 1408 invalid permutations, because the mix of sets had 10647 results. I copied the results into a text file and counted the number of lines as well as obtained the line numbers of the invalid permutations using this script:
----------
#!/bin/sh

tornames=("tornode01" "tornode02" "tornode03" "tornode04" "tornode05" \
"tornode06" "tornode07" "tornode08" "tornode09" "tornode10" \
"tornode11" "tornode12" "tornode13" "tornode14" "tornode15" \
"tornode16" "tornode17" "tornode18" "tornode19" "tornode20" \
"tornode21" "tornode22")

for i in ${tornames[*]};
do
while read line; do echo $line|tr " " "\n"|grep $i |wc -l; done < ./vc_list.bak > ./lines.$i
grep -rn '3\|2' ./lines.$i | cut -d: -f1 > ./lines.$i.ln
done

----------

It saved a bunch of files for me as: lines.[hostname], containing a number on each line indicating the number of times the hostname appears on each line.
Then it grep'd out the lines with a 2 or a 3, asking grep to return the line number, and cut the line number from the output to a file named: lines.[hostname].ln

Then at the command line I did this:

# cat ./lines.*.ln > line.numbers.all
# sed 's/.*/&d/g' ./line.numbers.all > ./delete.sed
# sed -f delete.sed ./file.master >> file.trimmed


Using a sed delete file...finally I had my 9240 valid permutations:

# cat ./file.trimmed | wc
9240 27720 254520


Next I want to make this text list into an array that I can `source` into the monitoring script as an array.

# rsync ./file.trimmed ./perms_array.sh
sed -i -e 's/^\
./perms_array.sh

Almost done, I just need to fill in the array number with another sed expression.

# sed = ./perms_array.sh | sed 'N; s/^// ; s/\nperms\[// ; s/^/perms\[/' > \
./perms_array.final.sh

and now to put quotes around the array value:

# sed -e 's/\=/\=\"/' < ./perms_array.final > ./perms_array.final.new && rsync ./perms_array.final.tmp ./perms_array.final
# sed -e 's/$/\"/' < ./perms_array.final > ./perms_array.final.new && rsync ./perms_array.final.tmp ./perms_array.final


here's what the file looks like:

perms[1]="tornode01 tornode02 tornode03"
perms[2]="tornode01 tornode04 tornode05"
perms[3]="tornode01 tornode06 tornode07"
...

Now I can move on to write an essentially simple script that performs the test of all possible virtual circuits.

No comments: