Unix tips
From Waisman Brain Imaging Wiki
Handy UNIX Tricks & Tips
What's In This File?
Since UNIX doesn't use icons or filename extensions to identify files like Windows and Macs do, it can sometimes be tricky to figure out what a file is or has in it.
- file - Let UNIX Guess What Type Of File This Is
UNIX has a very clever command called 'file' which can make an educated guess as to the contents of a file. It is usually right, but not always. To have it guess the type of "abc.def", type:
LAN104% file abc.def
- stringer, strings - See The Strings In A Binary File
When 'file' doesn't give you enough information or guesses wrong, you can use the lab command 'stringer' (or the more flexible but less friendly program 'strings'). It will print out every displayable text string in a file:
LAN104% stringer abc.def | more
This can also be handy for extracting info from a partially-corrupted file that otherwise is lost. For example, if Outlook hiccups on your address book, you might still be able to get most or all of your addresses out by running 'stringer' on the address book file.
- hd - View Everything In A Binary File
Finally, if everything else fails, you can use 'hd', which will dump out the entire contents of a binary file in hexadecimal and in ASCII (when printable):
LAN104% hd abc.def | more
Simple Shell Scripting
Windows lets you build "batch" or "script" programs with DOS ".BAT" files. Macs have AppleScript. What does UNIX have? A lot! UNIX shell scripts, while being similar to ".BAT" files, are much more powerful. In fact, the entire lab web-based scheduling calendar was written in shell scripts.
One of the most common and frustrating things to do in UNIX is to rename a lot of files in bulk. This is usually pretty easy to do under DOS with the 'mv' command, but under UNIX, you need to write a shell script to do it. Here's how.
First, you need to create the shell script itself. Here's a template that you can borrow and modify to your needs:
#!/bin/sh for i in $@ do j=`dropext $i`.new echo Renaming file $i to $j # mv $i $j done
This shell script will take each file name passed to it, drop the extension from the filename, add ".new" to the end, and then rename the current filename to the ".new" filename. So, if you save this example as 'renamer', then run it:
LAN104% ./renamer myfile.oldit will rename "myfile.old" as "myfile.new".
Once you save it, you need to let UNIX know that it is an executable (program) file, or it won't work. To do this, type:
LAN104% chmod +x renamerand then it will work. Let's take a quick look at what each line in the shell script does:
#!/bin/shThis line tells UNIX that this is a (Bourne) shell script, as opposed to a Perl script, 'awk' script, etc.
for i in $@ do <b>...</b> doneThese 3 lines together tell UNIX to read in each argument from the command line (in our example above, there is only one, "myfile.old"). It assigns each argument in turn to the environment variable "i", and then applies the rest of the script to it.
j=`dropext $i`.newThis line is tricky; it does 2 things. First, it runs the command 'dropext' on the environment variable "i" (now referenced as "$i"). 'dropext' simply drops the file extension from a filename. For example,
LAN104% dropext myfile.old myfileThe line then adds ".new" to the end of the result from the 'dropext' command and assigns the whole mess to the "j" environment variable.
Note the funny syntax: the 'dropext' command is enclosed in "`" (backquote or backtic) characters. This tells UNIX to run the 'dropext' command and use whatever it prints out. Also, notice that there is no space in "j=...". This is just a shell gotcha that you need to watch out for.
To use a different file extension than ".old", simply replace ".old" with whatever you want, but be very careful to not change anything else on this line.
echo Renaming file $i to $j
This line is just being nice, and telling you what the script is doing. 'echo' simply prints out whatever you tell it to to. It is good to have a lot of 'echo' lines in your scripts.
# mv $i $j
This line does the actual rename of the file. Note that it currently has a "#" character at the beginning. This tells UNIX that the line is simply a comment and can be ignored. To make the script actually do the renaming, remove the "#". It is a good idea to comment out any lines like this that actually do something to files until you're confident that they'll work. Alternatively, you can replace the "#" with "echo" to harmlessly see exactly what the line will do when you run the script.
FTP The UNIX Way
If you're doing image analysis, you'll eventually need to transfer files from one UNIX computer to another one across campus. The best way to do this without using CDs or tapes is FTP.
Mac-users have probably used Fetch at some point to transfer files. Windows users have a lot of choices of FTP programs with nice interfaces that hide the true nature of FTP. UNIX doesn't hide a thing. Under UNIX, FTP is a command-line driven program with lots of arcane features and settings. (Does this sound familiar?)
Just to make things confusing, FTP is fairly (but not completely) standardized. What commands work and exactly how they work depend both on the computer that you're using and the computer that you will connect to.
And to confuse things further still, if you want to use FTP with the Keck computers, you need to use 'sftp' instead of 'ftp'. This is a newer, more secure version of FTP. Once you connect, it works pretty much like other versions of FTP. Here's how to connect from LAN104:
LAN104% sftp myusername@tezpur.keck.waisman.wisc.edu Connecting to tezpur.keck.waisman.wisc.edu... myusername@tezpur.keck.waisman.wisc.edu's password: sftp>
When using FTP, the computer you're sitting in front of is called the "local" computer; the computer that you're connecting to via FTP is called the "remote" computer. Most of the commands cause something to happen on the remote computer, and the local-specific commands usually start with "l" (like "lcd" to change directories on the local computer).
The basic commands you'll use a lot are:
- help - see what commands are available. They can vary, but the ones listed here are always available.
- open computername- open a connection to a computer. You'll be prompted for a login name and password. You'd use the same login as if you used telnet to connect to the computer.</p>
Here's how you use it to connect to the lab anonymous FTP server on PSYPHZ:
LAN104:joviko% ftp ftp> open psyphz Connected to psyphz. 220-Local time is now 14:54 and the load is 0.00. 220 You will be disconnected after 1800 seconds of inactivity. Name (psyphz:generic): anonymous 230 Anonymous user logged in. ftp>
- close - close the current connection.
- bye - exit from FTP altogether. "exit" or "quit" work sometimes.
- dir - directory. Show the contents of the current directory on the "remote" computer. 'ls' often also works. Sometimes "ldir" works to show the contents of the current directory on your "local" computer.
- pwd - print working directory. Shows which directory you're currently in on the remote computer.
- cd directoryname- change directory on the remote computer. This works the same as at the UNIX command prompt. There may also be an "lcd" to change directory on your local computer.
- ascii - prepare to send and/or receive ASCII (text) files. This is often the default, and is the cause of many FTP problems. To be safe, run the "binary" command as soon as you open a connection. You very rarely need to use ASCII mode, as transferring text files from one UNIX system to another via "binary" works fine (since they're the same format). You only need ASCII mode when transferring text files (not Word documents!) from Windows or a Mac to UNIX.
- binary - prepare to send and/or receive non-text (binary) files. You should ALWAYS set this unless you have a good reason not to, and you're doubly sure that it is a good reason.
- get filename - transfer a file from the remote computer and store it on your local computer. This command assumes that you know what you are doing: it doesn't check for adequate disk space on your computer, and presumes that you've typed in the filename correctly. This can be tricky sometimes (like when the filename has punctuation in it). Which leads to the lazy version:
- mget filename_pattern - transfer a bunch of files from the remote computer and store them on your local computer. For example,
ftp> mget tiger*
would retrieve "tiger.ps", "tiger_tiger_burning_bright.txt", and "tiger is a big cat.DOC". This is also useful for getting a single file with a really long or tricky filename. - prompt - switch back and forth from "are you sure?" mode. This command sometimes can be used as "prompt on" or "prompt off", but usually only works as "prompt", which toggles back and forth. The default is usually "prompt on", which means that the computer asks you if you're sure that you want to transfer each individual file when you're getting or putting a bunch. This is handy the very first time you use FTP, but quickly gets annoying.
It doesn't hurt to type in "prompt" right after "binary" just to turn it off.
"prompt" is handy when you want to get one file from a list when they all have long, complex names. Just turn prompt on, do an "mget" of the whole bunch, and only say yes to actually transfer the one file that you want.
- put, mput - Send files from your local computer to the remote computer. These work just like "get" and "mget", only in the opposite direction. If you don't have permissions for creating files on the remote computer in the directory that you've "cd"ed to, "put" and "mput" won't work.
- mkdir directoryname - Create a directory on the remote computer. You can then "cd" into the new directory and "put" files there.
- del or delete filename - Delete file from remote computer.
- rmdir directoryname - Delete directory from remote computer. You must "del" all the files from a directory before you can "rmdir" it.
Here is an example FTP session:
LAN104% ftp psyphz Connected to psyphz. 220-Local time is now 09:41 and the load is 0.00. 220 You will be disconnected after 1800 seconds of inactivity. Name (psyphz:generic): anonymous 230 Anonymous user logged in. ftp> binary 200 TYPE is now 8-bit binary ftp> prompt Interactive mode off. ftp> cd silly 250 Changed to /silly ftp> pwd 257 "/silly" ftp> dir 200 PORT command successful 150 Connecting to 144.92.195.80:39394 total 1 -rw-r--r-- 1 102 50 425988 Mar 6 13:03 badday-small.mpg -rw-r--r-- 1 102 50 5139913 Mar 6 13:02 badday.mpeg -rw-r--r-- 1 102 50 442831 Mar 6 12:06 catattack.mpg -rw-r--r-- 1 102 50 380932 Mar 6 12:57 smelly_monkey.mpg -rw-r--r-- 1 102 50 81252 Mar 22 09:40 tiger.ps 226-Options: -l 226 5 matches total 354 bytes received in 0.015 seconds (23.44 Kbytes/s) ftp> get tiger.ps 200 PORT command successful 150-Connecting to 144.92.195.80:39395 150 79.3 kbytes to download 226 File written successfully local: tiger.ps remote: tiger.ps 81252 bytes received in 0.059 seconds (1346.22 Kbytes/s) ftp> mget ba* 200 PORT command successful 150-Connecting to 144.92.195.80:39397 150 416.0 kbytes to download 226-File written successfully 226 0.203 seconds (measured here), 1.76 Mbytes per second local: badday-small.mpg remote: badday-small.mpg 425988 bytes received in 0.22 seconds (1911.01 Kbytes/s) 200 PORT command successful 150-Connecting to 144.92.195.80:39398 150 5019.4 kbytes to download 226-File written successfully 226 1.704 seconds (measured here), 2.85 Mbytes per second local: badday.mpeg remote: badday.mpeg 5139913 bytes received in 1.8 seconds (2748.07 Kbytes/s) ftp> del tiger.ps 250 Deleted tiger.ps ftp> dir 200 PORT command successful 150 Connecting to 144.92.195.80:39399 total 1 -rw-r--r-- 1 102 50 425988 Mar 6 13:03 badday-small.mpg -rw-r--r-- 1 102 50 5139913 Mar 6 13:02 badday.mpeg -rw-r--r-- 1 102 50 442831 Mar 6 12:06 catattack.mpg -rw-r--r-- 1 102 50 380932 Mar 6 12:57 smelly_monkey.mpg 226-Options: -l 226 4 matches total 290 bytes received in 0.01 seconds (28.02 Kbytes/s) ftp> bye 221-Goodbye. You uploaded 0 and downloaded 5515 kbytes. 221 CPU time spent on you: 1.080 seconds. LAN104%
Symbolic Links
Symbolic links under UNIX are like "shortcuts" under Windows and "aliases" on Macs. They're basically just a tiny file placed in a convenient location that points to the real file or directory.
They can be very useful for setting up a single directory on a small disk that has links to directories on several large disks. This is how the lab's "/exp" directory works. "/exp" is stored on a relatively small disk, and all of the "/exp/studyname" subdirectories actually are stored on large disks. There is a symbolic link in "/exp" to each one.
This makes it easy for researchers to keep track of where a study is: "/exp/mystudy", even if the study actually is stored on the big "/da" disk (or is moved to "/db" at some point).
Symbolic links can be a little confusing, since the "pwd" command usually tells you exactly where you are, and doesn't keep track of when you've moved into a directory via a symbolic link:
LAN104% cd /exp /exp LAN104% cd BISPIL /exp/BISPIL LAN104% pwd /da/exp/BISPIL
The important thing to remember is to go ahead and use the convenient shorthand made possible by the links. When you write scripts or give people instructions, always use "/exp/mystudy" rather than "/current_disk_which_might_change_tomorrow/exp/mystudy". For example, use "/exp/BISPIL" rather than "/da/exp/BISPIL". It'll make life easier for everyone.
Using Floppies And CDs Under Linux
UNIX, unlike Windows and Macs, does not treat disks as separate entities. Instead, they are handled as directory subtrees within the overall directory tree which starts at "/" (root). When a disk is connected (or "mount"ed), it is simply attached to a subdirectory. Any subdirectory will work.
One subdirectory that is commonly used for removable disks (such as floppies, CDs, and Zip disks) is "/mnt". The lab Linux PCs are set up so that "/mnt" has 2 or 3 subdirectories: "/mnt/floppy", "/mnt/cdrom", and possibly "/mnt/zip".
To access any of the 3 types of removable disk, the procedure is the same:
- Insert the floppy, CDROM, or Zip disk into the drive.
- Type in the command "mount /mnt/disktype", for example, "mount /mnt/cdrom".
- "cd" to the newly attached disk, for example "cd /mnt/cdrom". Do a "ls" to ensure that the contents are as you expect.
- You can now use the disk just like any other part of the UNIX system. You can copy from it or to it (except for CDROMs which are read-only); you can run "df -k /mnt/zip" to see how much free space is on it; and you can run "du -sk /mnt/cdrom" to see how much disk space its contents use.
- When you are done with the disk, unlike Windows, you can't simply pop it out of the drive. First, "cd" out of the disk's directory. The "umount" command won't work otherwise.
- Now unmount the disk: "umount /mnt/disktype", for example, "umount /mnt/floppy".
- Now you can remove the disk from the drive.
Note that on the lab Linux PCs, the floppies and Zip disks are expected to be PC format. Using Mac-formatted floppies and Zip disks is possible, but you'll need some help from your friendly system administrator. CDROMs are expected to be ISO-9660 format (which is the same format as the lab archive disks, and is fairly common).
If the "mount" command doesn't seem to work, you'll probably need help from your system administrator; it is possible that something isn't set up correctly on the PC that you're using.
If the disk won't come out of the drive when you're done using it, make sure that you've run the "umount" command and that it worked.
Linux Vs. Solaris and Irix: Byteswapping
One of the most subtle "gotchas" that you have to watch out for when working with multiple types of computers is "endian-ness". Some computers, when they store numbers into memory or onto disk, write the "biggest" (or most significant) part of the number first, while others write the smallest part first. This is just a convention that computer makers decided on one way or the other, and it doesn't have any real effect on computer performance. It just makes data stored by some computers incompatible with other computers, and there isn't an easy way to see which way the data is stored.
All Intel-compatible PCs and VAXes are "little-endian". Just about everything else, including all Macs, Sun UNIX computers, SGI UNIX computers, and many others are "big-endian".
So, if you process some data on LAN104 (a Sun), and then decide to do the next step on LAN106 (a Linux PC), you need to be very careful to make sure that the program you're using knows how to deal with data that has the wrong "endian-ness" to it. The program handles the problem by "byteswapping", or swapping the data end-for-end.
We in the lab are working hard to make sure that all programs used here for data processing and analysis can do "byteswapping". If you're in doubt about a particular application, check with one of the lab developers; don't just assume that it can.
Note that endianness does is not a problem for text files (or any other data where the values are stored in a single byte). Note too that there is no such thing as a "universal translator" for endianness; each application must be able to byteswap the particular file format with which it is familiar.

