4. Files and Directories in Unix - Developing Bioinformatics Computer Skills [Book]
Files and Directories in Unix Now that you've set up your workstation, let's What if you want to find out where your home directory is in relation to the rest of the .. (recursive option) Removes all directories and subdirectories in the list of files. How to grep recursively on Solaris. grep -r doesn't work on Solaris. You can only grep on files in the current directory. A workaround is to use grep with find. Oct 25, In this article, we will show you how to delete files in a directory except certain file extensions or types using rm, find and globignore commands.
Most humans are much better at remembering frequently used patterns than they are at remembering unique character strings, after all. A common convention in file naming is to name the file with a unique name followed by a dot. As you begin working with computers in your research and structuring your data environment, you need to develop your own file-naming conventions, or preferably, find out what naming conventions already exist and use them consistently throughout your project.
There's nothing so frustrating as looking through old data sets and finding that the same type of file has been named in several different ways. Have you found all the data or results that belong together? Can the file you are looking for be named something else entirely? In the absence of conventions, there's no way to know this except to open every unidentifiable file and check its format by eye.
The next section provides a detailed example of how to set up a filesystem that won't have you tearing out your hair looking for a file you know you put there. Here are some good rules of thumb to follow for file-naming conventions: Files of the same type should have the same extension. Files derived from the same source data should have a common element in their unique names. The unique name should contain as much information as possible about the experiment.
Filenames should be as short as is possible without compromising uniqueness. You'll probably encounter preestablished conventions for file naming in your work. For instance, if you begin working with protein sequence and structure datafiles, you will find that families of files with the same format have common extensions.
You may find that others in your group have established local conventions for certain kinds of datafiles and results. You should attempt to follow any known conventions.
An Example Let's take a look at an example of setting up a filesystem. These are real directory layouts we have used in our work; only the names have been changed to protect the innocent. In this case, we are using a single directory to hold the whole project.
It's useful to think of the filesystem as a family tree, clustering related aspects of a project into branches. The top level of your project directory should contain two text files that explain the contents of the directories and subdirectories.
The first file should contain an outline of the project, with the date, the names of the people involved, the question being investigated, and references to publications related to this project. Per Jambeck, Cynthia Gibas Question: Are there recurrent structural words in the three-dimensional structure of proteins?
Automatic construction of a dictionary of elements of local structure in proteins using entropy maximization-based learning. The second file should be an index file named something readily recognizable like INDEX that explains the overall layout of the subdirectories.
If you haven't really collected much data yet, a simple sketch of the directories with explanations should do. For example, the following file hierarchy: Tree diagram of a hierarchy In this directory, we've made the first distinction between programs and data programs contains the software we write, and data contains the information we get from databases, or files the programs generate.
Within each subdirectory, we further distinguish between types of data in this case, protein structures and protein sequencesand results run on two sets of proteins, the enolase family and the globin superfamily gleaned from running our programs on the data, and some test cases.
Programs are also subdivided according to types, namely whether they are the human-readable program listings source codescripts that aid in running the programs, or the binaries of the programs. As we mentioned earlier, when you store data in files, you should try to use a terse and consistent system for naming files. We then used a homegrown Perl program called unique. Thus, we can represent this information economically using the filename PDB-unique for files related to this data set.
For example, the list of the names of proteins in the set, and the file containing the proteins' sequences in FASTA format a common text-file format for storing macromolecular sequence dataare stored, respectively, in: For example, the file containing all seven-residue pieces of protein structure derived from the nonredundant set is called PDB-unique File naming conventions can take you only so far in organizing a project; the simple naming schemes we've laid out here will become more and more confusing as a project grows.
For larger projects, you should consider using a database management system DBMS to manage your data. We introduce database concepts in Chapter Commands for Working with Directories and Files Now that you have the basics of filesystems, let's dig into the specifics of working with files and directories in Unix. In the following sections, we cover the Unix commands for moving around the filesystem, finding files and directories, and manipulating files and directories.
As we introduce commands, we'll show you the format of the command line for each command for example, "Usage: Moving Around the Filesystem When you open a window on a Linux system, you see a command prompt: For example, the following user is using the tcsh shell environment and has configured the command prompt to show the username and current working directory: If you type an instruction at the prompt and press the Enter key, you have given your computer a command.
Unix provides a set of simple navigation commands and commands for searching your filesystem for particular files and programs. We'll discuss the format of commands more thoroughly in Chapter 5. In this chapter, we'll introduce you to basic commands for getting around in Unix. You can think of being "in" a directory in this way: When you log in to the system, your "you are here" pointer is automatically placed in your home directory.
Your home directory is a unique place. It contains the files you use almost every time you log into your system, as well as the directories that you create to store other files.
What if you want to find out where your home directory is in relation to the rest of the system? Typing pwd at the command prompt in your home directory should give output something like: Changing directories with cd Usage: The only argument commonly used with this command is the pathname of a directory. If cd is used without an argument, it changes the current working directory to the user's home directory.
In order for these "you are here" tools to be helpful, you need to have organized your filesystem in a sensible way in the first place, so that the name and location of the directory that you're in gives you information about what kind of material can be found there. Most of the filesystem of your machine will have been set up by default when you installed Linux, but the organization of your own directories, where you store programs and data that you use, is your responsibility.
Finding Files and Directories Unix provides many ways to find files, from simply listing out the contents of a directory to search programs that look for specified filenames and the locations of executable programs. Listing files with ls Usage: Simply typing the Unix list command, lsat the prompt gives you a listing of all the files and subdirectories in the current working directory.
You can also give a directory name as an argument to ls. It then prints the names of all files in the named directory. If you have files in a series such as ch1 to ch14or files with common characters like those ending in.
For example, let's say you're looking for files called seq11, seq25, and seq34 in a directory of files. Instead of scrolling through the list of files by eye, you could find them by typing: You know that text files usually end with.
The most useful of these are: Filenames beginning with a dot. Hidden files often contain configuration instructions for programs, and it's sometimes necessary to examine or modify them. The content of the current directory is listed, and whenever a subdirectory is reached, its contents are also explicitly included in the listing. This command can create a catalog of files in your filesystem. A single-column listing of all your source datafiles can quickly be turned into a shell script that executes an identical operation on each file, using just a few regular-expression tricks.
Interpreting ls output ls gives its output in two formats, the short and the long format. The short format is the default. It includes only the name of each file along with information requested using the -F or -s options: The first 10 characters in the line give information about file permissions. The first character describes the file type.
You will commonly encounter three types of files: The next nine characters are actually three sets of three bits containing file permission information. The first three characters following the file type are the file permissions for the user.
The next set are for the user's group, and the final set are for users outside the group. The character string rwxrwxrwx indicates a file is readable rwritable wand executable x by any user.
We talk about how to change file permissions and file ownership in Section 4. The next column in the long format file listing tells you how many links a file has; that is, how many directory listings for that file exist on the filesystem.
The same file can be named in multiple directories. In the section Section 4. The next two columns show the ownership of the file.
The owner of the files in the preceding example is jambecka member of the group weasel. The next three columns show the size of the file in characters, and the date and time that the file was last modified. The final column shows the name of the file. Finding files with find Usage: There are over 20 different tests that can be used with find; here are a few of the most useful: Changing a file refers to any change, including a change in permissions, whereas modification refers only to changes to the internal text of the file.
Performing two find tests one after another amounts to applying a logical "and" between the tests. A -o between tests indicates a logical "or. Let's say you want a list of every file you have modified in your home directory and all subdirectories in the last week: Now let's go back to the original problem and find executable files. One way to do this with find is to use the following command: Any Unix command can be used as the object of -exec. As always, you need to refer to your manual pages, or manpages, for more details for more on manpages, see Chapter 5.
Finding an executable file with which Usage: This is useful if you want to know where a program is located, if, for instance, you want to be sure you're using the right version of the program.
Finding an executable file with whereis Usage: Unlike which, whereis isn't dependent on your path, but it looks for programs only in a limited set of directories, so it doesn't give a definitive answer about the existence of a program. Manipulating Files and Directories Of course, just as with the stacks of papers on your desk, you periodically need to do some housekeeping on your files and directories to keep everything neat and tidy.
Unix provides commands for moving, copying, and deleting files, as well as creating and removing directories. Copying files and directories with cp Usage: If the destination is a directory, the source can be multiple files, copies of which are placed in the destination directory.
find - Solaris equivalent for depth/prune? - Unix & Linux Stack Exchange
Frequently used options are -R and -r. Both copy recursively; that is, they copy the source directory and all its subdirectories to the destination.
The -R option prevents cp from following symbolic links; only the link itself is copied. The -r option allows cp to follow symbolic links and copy all files it finds. This can cause problems if the symbolic links happen to form a circular path through the filesystem. Normally, new files created by cp get their file ownership and permissions from your shell settings. However, the POSIX version of cp provides an -a option that attempts to maintain the original file attributes. Moving and renaming files and directories with mv Usage: Files and directories can both be either source or destination.
Developing Bioinformatics Computer Skills by Cynthia Gibas, Per Jambeck
If both source and destination are files or both are directories, the result of mv is essentially that the file or directory is renamed.
If the destination is a directory, and the intention is to move already existing files or directories under that directory in the hierarchy, the directory must exist before the mv command is given. Otherwise the destination is created as a regular file, or the operation is treated as a renaming of a directory. One problem that can occur if mv isn't used carefully is when source represents a file list, and destination is a preexisting single file.
When this happens, each member of source is renamed to destination and then promptly overwritten, leaving only the last file of the list intact.
That's because, when tar extracts a file before its directory, it temporarily creates a writable directory to hold the file. Then, later, when tar extracts the directory itself, the directory's contents will already have been extracted - so all tar has to do is to set the directory's new unwritable permissions. GNU find understands both of those. It tells find not to descend into directories mounted from other filesystems.
This is handy, for example, to avoid network-mounted filesystems. A more specific test is -fstype, which tests true if a file is on a certain type of filesystem. Different systems have different filesystem names and types, though.
Text Output Early versions of find had basically one choice for outputting a pathname: Later, -ls was added; it gives an output format similar to ls -l. The new -printf action lets you use a C-like printf format. This has the usual format specifiers like the filename and the last-modification date, but it has others specific to find. One simple use for this is to make your own version of ls that gives just the information you want.
- Setting file and directory permissions
- Unix find tutorial
- 3 Ways to Delete All Files in a Directory Except One or Few Files with Extensions
As an example, the following bash function, findc, searches the command-line arguments or, if there are no arguments, the current directory. The longstanding -print action writes a pathname to the standard output, followed by a newline character. If that pathname happens to contain a newline, you get two newlines. A newline is legal in a filename.
Most shells also break command-line arguments into words at whitespace tabs, spaces and newlines ; this means that command substitution the backquote operators could fail if, say, a filename contained spaces. It wasn't too long before programmers fixed this problem by adding the -print0 action; it outputs a pathname followed by NUL a zero byte.
Because NUL isn't legal in a filename, this pathname delimiter solved the problem - when find's output was piped to the command xargs -0, which accepts NUL as an argument separator.
Because find can do many different tests as it traverses a filesystem, it's good to be able to choose what should be done in each individual case. For instance, if you run a nightly cron job to clean up various files and directories from all of your disks, it's nice to do all of the tests in a single pass through the filesystem - instead of making another complete pass for each of your tests.
But it's also good to avoid the overhead of running utilities like rm and rmdir over and over, once per file, in a find job like this one using -exec: The new -fprintf and -fprint0 actions can solve this problem. They write a formatted string to a file you specify.
Search files containing string recursively in solaris - Stack Overflow
An empty file has no bytes; an empty directory has no entries. One place this is handy is for removing empty directories while you're cleaning a filesystem. Then you can use an expression like the following: These are a lot more efficient than the old methods -exec false and -exec true that execute the external Linux commands false 1 and true 1.
Now -perm also accepts arguments starting with a plus sign. It means "any of these bits are set. Here's what the find man page on my system says about it: It causes find to not descend into the current file. Note, the -prune primary has no effect if the -d option was specified.
Is this a viable replacement for -maxdepth? When I run it the output isn't useful: You can see what I'm working on as this is my desktop. AIX find indeed does not have the -maxdepth flag option. The man page returns this for the -prune flag on my AIX 5. Stops the descent of the current path name if it is a directory. If the -depth flag is specified, the -prune flag is ignored.
Scott at December 17, 2: Baker at February 11, 4: And while it is certainly possible to parse the output for the desired results, this does not stop the "find" command from needless searching. Without a "maxdepth" option, the best way to list all files in a given directory is: Gus Schlachter at March 23, 2: I was doing this on Solaris machine, no maxdepth also, using prune with -d option, wrong, the AIX help where it says, don't use with -d, fixed the problem, works like charm now.
Frankce10 at May 5, 8: Any easy alternative to this?? Ankush Jhalani at October 11, 4: Hoiw do I use that code you gave me. When I tried to use it - I got I put it in like this: