WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: Help Requested - remove superfluous directory names from file list  (Read 4004 times)

Online Juanito

  • Administrator
  • Hero Member
  • *****
  • Posts: 14516
As I'm hopeless at scripting, help would be appreciated with a script to remove superfluous directory names from an extension file list.

As an example, to go from this:
Code: [Select]
/usr
/usr/local
/usr/local/include
/usr/local/include/zbuff.h
/usr/local/include/zdict.h
/usr/local/include/zstd.h
/usr/local/include/zstd_errors.h
/usr/local/lib
/usr/local/lib/pkgconfig
/usr/local/lib/pkgconfig/libzstd.pc

..to this:
Code: [Select]
/usr/local/include/zbuff.h
/usr/local/include/zdict.h
/usr/local/include/zstd.h
/usr/local/include/zstd_errors.h
/usr/local/lib/pkgconfig/libzstd.pc

Offline polikuo

  • Hero Member
  • *****
  • Posts: 714
Re: Help Requested - remove superfluous directory names from file list
« Reply #1 on: November 17, 2017, 01:20:22 AM »
It looks like these lines were generated via "find" command.
How about:
Code: [Select]
find /usr -not -type dI'm having dinner at the moment, I'll try to script with this as soon as possible.

Online Juanito

  • Administrator
  • Hero Member
  • *****
  • Posts: 14516
Re: Help Requested - remove superfluous directory names from file list
« Reply #2 on: November 17, 2017, 03:23:11 AM »
They were generated with the “unsquashfs - l -d ' '“ command
« Last Edit: November 17, 2017, 04:09:46 AM by Juanito »

Offline mocore

  • Hero Member
  • *****
  • Posts: 506
  • ~.~
Re: Help Requested - remove superfluous directory names from file list
« Reply #3 on: November 17, 2017, 07:52:24 AM »
pipe to awk like

Code: [Select]
echo "$data" | awk  '/\./{print $0}'
To select lines containing '.' dot char !

Quote
echo "$data" | awk  '/\./{print $0}'
/usr/local/include/zbuff.h
/usr/local/include/zdict.h
/usr/local/include/zstd.h
/usr/local/include/zstd_errors.h
/usr/local/lib/pkgconfig/libzstd.pc


How ever this will not work with
-files with no '.ext' extention
-or folders containng a '.' char ...
« Last Edit: November 17, 2017, 07:54:45 AM by mocore »

Offline polikuo

  • Hero Member
  • *****
  • Posts: 714
Re: Help Requested - remove superfluous directory names from file list
« Reply #4 on: November 17, 2017, 07:54:17 AM »
They were generated with the “unsquashfs - l -d ' '“ command

Save this script as "strip-path.sh"
Code: [Select]
#!/bin/sh
OUTPUT_DIR=/tmp/tcz-list
mkdir -p $OUTPUT_DIR
strip_path() {
  awk 'BEGIN {FS="\n";RS=""} {
    print $NF
    for (i=NF;i>1;i--) {
      if ($i !~ $(i - 1)) print $(i - 1)
    }
  }' < /dev/stdin
}
for TCZ in $@; do
  unsquashfs -l $TCZ | grep 'squashfs-root/' | cut -d '/' -f 2- | strip_path > ${OUTPUT_DIR}/"$(basename $TCZ)".list
done

To run the script
Code: [Select]
./strip-path.sh /etc/sysconfig/tcedir/optional/zstd*.tcz

Results
Quote
tc@box:/tmp/tcz-list$ ls
zstd-dev.tcz.list  zstd.tcz.list
tc@box:/tmp/tcz-list$ cat zstd-dev.tcz.list
usr/local/lib/pkgconfig/libzstd.pc
usr/local/include/zstd_errors.h
usr/local/include/zstd.h
usr/local/include/zdict.h
usr/local/include/zbuff.h
tc@box:/tmp/tcz-list$ cat zstd.tcz.list
usr/local/bin/zstdmt
usr/local/bin/zstdless
usr/local/bin/zstdgrep
usr/local/bin/zstdcat
usr/local/bin/unzstd

Note that they're in reverse order.
If you're not OK with that, you can use "tac" command from "coreutils.tcz" to flip it back or this "sed" one-liner.
Code: [Select]
sed '1!G;h;$!d'
« Last Edit: November 17, 2017, 08:07:38 AM by polikuo »

Online Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 11178
Re: Help Requested - remove superfluous directory names from file list
« Reply #5 on: November 17, 2017, 08:54:42 AM »
Hi Juanito
Here's my entry:
Code: [Select]
#!/bin/sh

SourceFile="$1"
SortedFile="sorted.lst"
DestFile="stripped.lst"
SubString=""

rm -f $DestFile

sort -o "$SortedFile" "$SourceFile"

while read -r String
do
case "$String" in
"$SubString"*)
;;
*)
echo "$SubString" >> "$DestFile"
;;
esac
SubString=$String
done < "$SortedFile"
echo "$SubString" >> "$DestFile"

Here's the result:
Code: [Select]
tc@box:~/remdir$
tc@box:~/remdir$ cat orig.lst
/usr/local/lib
/usr/local/include/zbuff.h
/usr/local/lib/pkgconfig/libzstd.pc
/usr/local/include/zdict.h
/usr/local
/usr/local/include/zstd.h
/usr
/usr/local/lib/pkgconfig
/usr/local/include
/usr/local/include/zstd_errors.h
tc@box:~/remdir$
tc@box:~/remdir$ ./RemoveDirs orig.lst
tc@box:~/remdir$
tc@box:~/remdir$ cat stripped.lst
/usr/local/include/zbuff.h
/usr/local/include/zdict.h
/usr/local/include/zstd.h
/usr/local/include/zstd_errors.h
/usr/local/lib/pkgconfig/libzstd.pc
tc@box:~/remdir$
tc@box:~/remdir$

The list is first sorted. Each line is then tested to see if it's a substring of the next line. If you have   /usr/local  then local has to
be either a file or a subdirectory, it can't be both. So if  /usr/local  is a substring of the next line, then it's a directory and gets
discarded.

Offline andyj

  • Hero Member
  • *****
  • Posts: 1020
Re: Help Requested - remove superfluous directory names from file list
« Reply #6 on: November 17, 2017, 01:23:47 PM »
How about using the double l switch with unsquashfs:

Code: [Select]
unsquashfs -ll -d '' some-extension.tcz | grep -v '^d' | sed 's#.* /#/#'

Online Juanito

  • Administrator
  • Hero Member
  • *****
  • Posts: 14516
Re: Help Requested - remove superfluous directory names from file list
« Reply #7 on: November 17, 2017, 11:41:38 PM »
Thanks for all the suggestions 🙂
« Last Edit: November 17, 2017, 11:44:45 PM by Juanito »

Online Juanito

  • Administrator
  • Hero Member
  • *****
  • Posts: 14516
Re: Help Requested - remove superfluous directory names from file list
« Reply #8 on: November 28, 2017, 01:57:40 AM »
I like this:
Code: [Select]
$ unsquashfs -ll -d '' svn.tcz | grep -v '^d' | sed 's#.* /#/#'
..but:
Code: [Select]
$ cat svn.tcz.list
Parallel unsquashfs: Using 4 processors
48 inodes (611 blocks) to write

/usr/local/bin/svn
/usr/local/bin/svnadmin
/usr/local/bin/svndumpfilter
/usr/local/bin/svnlook
/usr/local/bin/svnmucc
/usr/local/bin/svnrdump
/usr/local/bin/svnserve
/usr/local/bin/svnsync
/usr/local/bin/svnversion
/usr/local/lib/libsvn_client-1.so -> libsvn_client-1.so.0.0.0
/usr/local/lib/libsvn_client-1.so.0 -> libsvn_client-1.so.0.0.0
...

Offline andyj

  • Hero Member
  • *****
  • Posts: 1020
Re: Help Requested - remove superfluous directory names from file list
« Reply #9 on: November 28, 2017, 04:18:50 AM »
I knew the links would be that way when I tested it. I didn't know if that would be useful information or a problem, depending on how the resulting file list is used. Adding another regex to sed will clear that up:

Code: [Select]
unsquashfs -ll -d '' svn.tcz | grep -v '^d' | sed -e 's#.* /#/#' -e 's# -> .*##'
A question I also thought of when I was cobblling my script together is whether or not directories should be included if they are empty. No directories will be listed thanks to grep.

Online Juanito

  • Administrator
  • Hero Member
  • *****
  • Posts: 14516
Re: Help Requested - remove superfluous directory names from file list
« Reply #10 on: November 28, 2017, 04:28:56 AM »
I guess empty directories should be included if they actually exist and are required for things to work, but this would almost never be the case as empty directories under /var, etc should be created by a startup script.

Offline polikuo

  • Hero Member
  • *****
  • Posts: 714
Re: Help Requested - remove superfluous directory names from file list
« Reply #11 on: November 29, 2017, 12:01:00 AM »
I've adjusted my script so it won't accidentally drop lines we need.  :)
awk '{print $1}' should drop any "link redirections".

Code: [Select]
#!/bin/sh
OUTPUT_DIR=/tmp/tcz-list
mkdir -p $OUTPUT_DIR
strip_path() {
  awk 'BEGIN {FS="\n";RS=""} {
    for (i=1;i<NF;i++) {
      if ($(i + 1) !~ $i"/") print $i
    }
  }' < /dev/stdin
}
for TCZ in $@; do
  unsquashfs -l $TCZ | grep 'squashfs-root/' | cut -d '/' -f 2- | awk '{print $1}' | strip_path > ${OUTPUT_DIR}/"$(basename $TCZ)".list
done

Some explanations:
Each (awk) loop, I take two lines and compare the differences.
If the patterns does not look like something below a directory, then print the line we're checking.

For instance:
Quote
tc@box:/tmp$ unsquashfs -l svn.tcz | grep 'squashfs-root/' | cut -d '/' -f 2- | awk '{print $1}'
usr
usr/local
usr/local/bin
usr/local/bin/svn
usr/local/bin/svnadmin
...

In the awk function:
Code: [Select]
if ($(i + 1) !~ $i"/") print $i
loop1: if (usr/local !~ usr/) ==> "usr/local" contains "usr/" --> skip
loop2: if (usr/local/bin !~ usr/local/) ==> "usr/local/bin" contains "usr/local/" --> skip
loop3: if (usr/local/bin/svn !~ usr/local/bin/) ==> "usr/local/bin/svn" contains "usr/local/bin/" --> skip
loop4: if (usr/local/bin/svnadmin !~ usr/local/bin/svn/) ==> "usr/local/bin/svnadmin" does not fit "usr/local/bin/svn/"
--> print "usr/local/bin/svn"

And so on ~
By using this technique, empty directory can be preserved.

P.S.
I flip the listing order back to normal, it's no longer in reverse.  ;)

P.P.S.
IIRC, the leading slash '/' of "/usr/local/bin/svn" should be removed, no  ??? ?
« Last Edit: November 29, 2017, 12:32:09 AM by polikuo »

Online Juanito

  • Administrator
  • Hero Member
  • *****
  • Posts: 14516
Re: Help Requested - remove superfluous directory names from file list
« Reply #12 on: December 20, 2017, 03:44:36 AM »
In the end I've settled on the following, but thanks to all for your help.

Code: [Select]
$ unsquashfs -ll -d '' extension.tcz | grep -v '^d' | sed -e 's#.* /#/#' -e 's# -> .*##' -e 1,3d > extension.tcz.list