Working With Files and Directories
Overview
Teaching: 15 min
Exercises: 5 minQuestions
How can I create, copy, and delete files and directories?
Objectives
Create a directory hierarchy that matches a given diagram.
Create files in that hierarchy using an editor or by copying and renaming existing files.
Delete, copy and move specified files and/or directories.
Creating directories
We now know how to explore files and directories, but how do we create them in the first place?
Step one: see where we are and what we already have
Let’s go back to our data-shell directory on the Desktop
and use ls -F to see what it contains:
$ pwd
/home/lortizur/data-shell
$ ls -F
creatures/ data/ molecules/ north-pacific-gyre/ notes.txt pizza.cfg solar.pdf writing/
Create a directory
Let’s create a new directory called thesis using the command mkdir thesis
(which has no output):
$ mkdir thesis
As you might guess from its name,
mkdir means ‘make directory’.
Since thesis is a relative path
(i.e., does not have a leading slash, like /what/ever/thesis),
the new directory is created in the current working directory:
$ ls -F
creatures/ data/ molecules/ north-pacific-gyre/ notes.txt pizza.cfg solar.pdf thesis/ writing/
Good names for files and directories
Complicated names of files and directories can make your life painful when working on the command line. But extremely short and cryptic filenames will make your life painful when you return after days (weeks, months…) and try to remember what you were doing. Here we provide a few useful tips for the names of your files.
Don’t use spaces.
Spaces can make a name more meaningful, but since spaces are used to separate arguments on the command line it is better to avoid them in names of files and directories. You can use
-or_instead (e.g.north-pacific-gyre/rather thannorth pacific gyre/).You may encounter files with spaces in their names if they have been uploaded from a laptop. To reference such files from the command line, enclose the entire name in quotes (
""or'').Don’t begin the name with
-(dash).Commands treat names starting with
-as options.Stick with letters, numbers,
.(period or ‘full stop’),-(dash) and_(underscore).Many other characters have special meanings on the command line. We will learn about some of these during this lesson. There are special characters that can cause your command to not work as expected and can even result in data loss.
If you need to refer to names of files or directories that have spaces or other special characters, you should surround the name in quotes (
"").
Since we’ve just created the thesis directory, there’s nothing in it yet:
$ ls -F thesis
Creating A File
$ touch my_file.txt
- What did the
touchcommand do?- Use
ls -lto inspect the files. How large ismy_file.txt?Solution
The
touchcommand generates a new file calledmy_file.txtin your current directory.When you inspect the file with
ls -l, note that the size ofmy_file.txtis 0 bytes. In other words, it contains no data. If you openmy_file.txtusing your text editor it is blank.
touching an existing file does not change it, but will update the date/time of its last edit to the current date/time.
Moving files and directories
Returning to the data-shell directory,
cd ~/data-shell/
Let’s create a file called draft.txt in our thesis directory using a text editor:
$ vi thesis/draft.txt
Let’s type a few lines. First, type the i character (this stands for insert)
It's not "publish or perish" any more,
it's "share and thrive"
To save the file using this editor, use the esc key and then type ZZ
In our thesis directory we now have a file draft.txt
that contains a quote. This is not
a particularly informative name,
so let’s change the file’s name using mv,
which is short for ‘move’:
$ mv thesis/draft.txt thesis/quotes.txt
The first argument tells mv what we’re ‘moving’,
while the second is where it’s to go.
In this case,
we’re moving thesis/draft.txt to thesis/quotes.txt,
which has the same effect as renaming the file.
Sure enough,
ls shows us that thesis now contains one file called quotes.txt:
$ ls thesis
quotes.txt
One has to be careful when specifying the target file name, since mv will
silently overwrite any existing file with the same name, which could
lead to data loss. An additional option, mv -i (or mv --interactive),
can be used to make mv ask you for confirmation before overwriting.
Note that mv also works on directories.
Let’s move quotes.txt into the current working directory.
We use mv once again,
but this time we’ll use just the name of a directory as the second argument
to tell mv that we want to keep the filename,
but put the file somewhere new.
(This is why the command is called ‘move’.)
In this case,
the directory name we use is the special directory name . that we mentioned earlier.
$ mv thesis/quotes.txt .
The effect is to move the file from the directory it was in to the current working directory.
ls now shows us that thesis is empty:
$ ls thesis
Further,
ls with a filename or directory name as an argument only lists that file or directory.
We can use this to see that quotes.txt is still in our current directory:
$ ls quotes.txt
quotes.txt
Moving Files to a new folder
After running the following commands, Jamie realizes that she put the files
sucrose.datandmaltose.datinto the wrong folder. The files should have been placed in therawfolder.$ ls -F analyzed/ raw/ $ ls -F analyzed fructose.dat glucose.dat maltose.dat sucrose.dat $ cd analyzedFill in the blanks to move these files to the
raw/folder (i.e. the one she forgot to put them in)$ mv sucrose.dat maltose.dat ____/____Solution
$ mv sucrose.dat maltose.dat ../rawRecall that
..refers to the parent directory (i.e. one above the current directory) and that.refers to the current directory.
Copying files and directories
The cp command works very much like mv,
except it copies a file instead of moving it.
We can check that it did the right thing using ls
with two paths as arguments — like most Unix/Linux commands,
ls can be given multiple paths at once:
$ cp quotes.txt thesis/quotations.txt
$ ls quotes.txt thesis/quotations.txt
quotes.txt thesis/quotations.txt
We can also copy a directory and all its contents by using the
recursive option -r,
e.g. to back up a directory:
$ cp -r thesis thesis_backup
We can check the result by listing the contents of both the thesis and thesis_backup directory:
$ ls thesis thesis_backup
thesis:
quotations.txt
thesis_backup:
quotations.txt
Renaming Files
Suppose that you created a plain-text file in your current directory to contain a list of the statistical tests you will need to do to analyze your data, and named it:
statstics.txtAfter creating and saving this file you realize you misspelled the filename! You want to correct the mistake, which of the following commands could you use to do so?
cp statstics.txt statistics.txtmv statstics.txt statistics.txtmv statstics.txt .cp statstics.txt .Solution
- No. While this would create a file with the correct name, the incorrectly named file still exists in the directory and would need to be deleted.
- Yes, this would work to rename the file.
- No, the period(.) indicates where to move the file, but does not provide a new file name; identical file names cannot be created.
- No, the period(.) indicates where to copy the file, but does not provide a new file name; identical file names cannot be created.
Moving and Copying
What is the output of the closing
lscommand in the sequence shown below?$ pwd/home/jamie/data$ lsproteins.dat$ mkdir recombine $ mv proteins.dat recombine/ $ cp recombine/proteins.dat ../proteins-saved.dat $ ls
proteins-saved.dat recombinerecombineproteins.dat recombineproteins-saved.datSolution
We start in the
/home/jamie/datadirectory, and create a new folder calledrecombine. The second line moves (mv) the fileproteins.datto the new folder (recombine). The third line makes a copy of the file we just moved. The tricky part here is where the file was copied to. Recall that..means ‘go up a level’, so the copied file is now in/home/jamie. Notice that..is interpreted with respect to the current working directory, not with respect to the location of the file being copied. So, the only thing that will show using ls (in/home/jamie/data) is the recombine folder.
- No, see explanation above.
proteins-saved.datis located at/home/jamie- Yes
- No, see explanation above.
proteins.datis located at/home/jamie/data/recombine- No, see explanation above.
proteins-saved.datis located at/home/jamie
Removing files and directories
Returning to the data-shell directory,
let’s tidy up this directory by removing the quotes.txt file we created.
The command we’ll use for this is rm (short for ‘remove’):
$ rm quotes.txt
We can confirm the file has gone using ls:
$ ls quotes.txt
ls: cannot access 'quotes.txt': No such file or directory
Deleting Is Forever
Shells don’t have a trash bin that we can recover deleted files from (though most graphical interfaces to Unix do - but interestingly, the one on HOPPER does not. But if you try to delete a file in HOPPER’s GUI, it will ask you if you are sure). Instead, when we delete files, they are unlinked from the file system so that their storage space on disk can be recycled. Tools for finding and recovering deleted files do exist, but there’s no guarantee they’ll work in any particular situation, since the computer may reuse the file’s disk space right away.
Using
rmSafelyWhat happens when we execute
rm -i thesis_backup/quotations.txt? Why would we want this protection when usingrm?Solution
$ rm: remove regular file 'thesis_backup/quotations.txt'? yThe
-ioption will prompt before (every) removal (use Y to confirm deletion or N to keep the file). The shell doesn’t have a trash bin, so all the files removed will disappear forever. By using the-ioption, we have the chance to check that we are deleting only the files that we want to remove. Some systems have the-ioption enabled by default. HOPPER accessed via asshterminal does not!
If we try to remove the thesis directory using rm thesis,
we get an error message:
$ rm thesis
rm: cannot remove `thesis': Is a directory
This happens because rm by default only works on files, not directories.
rm can remove a directory and all its contents if we use the
recursive option -r, and it will do so without any confirmation prompts:
$ rm -r thesis
Given that there is no way to retrieve files deleted using the shell,
rm -r should be used with extreme caution (you might consider adding the interactive option rm -r -i).
If a directory is empty, rmdir (the complement of mkdir) will also remove it.
Operations with multiple files and directories
Oftent one needs to copy or move several files at once. This can be done by providing a list of individual filenames, or specifying a naming pattern using wildcards.
Copy with Multiple Filenames
For this exercise, you can test the commands in the
data-shell/datadirectory.In the example below, what does
cpdo when given several filenames and a directory name?$ mkdir backup $ cp amino-acids.txt animals.txt backup/In the example below, what does
cpdo when given three or more file names?$ ls -Famino-acids.txt animals.txt backup/ elements/ morse.txt pdb/ planets.txt salmon.txt sunspot.txt$ cp amino-acids.txt animals.txt morse.txtSolution
If given more than one file name followed by a directory name (i.e. the destination directory must be the last argument),
cpcopies the files to the named directory.If given three file names,
cpthrows an error such as the one below, because it is expecting a directory name as the last argument.cp: target ‘morse.txt’ is not a directory
Using wildcards for accessing multiple files at once
Wildcards
*is a wildcard, which matches zero or more characters. Let’s consider thedata-shell/moleculesdirectory:*.pdbmatchesethane.pdb,propane.pdb, and every file that ends with ‘.pdb’. On the other hand,p*.pdbonly matchespentane.pdbandpropane.pdb, because the ‘p’ at the front only matches filenames that begin with the letter ‘p’.
?is also a wildcard, but it matches exactly one character. So?ethane.pdbwould matchmethane.pdbwhereas*ethane.pdbmatches bothethane.pdb, andmethane.pdb.Wildcards can be used in combination with each other e.g.
???ane.pdbmatches three characters followed byane.pdb, givingcubane.pdb ethane.pdb octane.pdb.When the shell sees a wildcard, it expands the wildcard to create a list of matching filenames before running the command that was asked for. As an exception, if a wildcard expression does not match any file, Bash will pass the expression as an argument to the command as it is. For example typing
ls *.pdfin themoleculesdirectory (which contains only files with names ending with.pdb) results in an error message that there is no file calledwcandlssee the lists of file names matching these expressions, but not the wildcards themselves. It is the shell, not the other programs, that deals with expanding wildcards.There are many other, fancier wildcards as well. For example:
[0-9]will match only numbers
[a-Z]will match any letters of either case. Unix/Linux alphabetizes as aAbBcCdD..zZ, so the mixed casesaandZare necessary to include all the letters in the range.
[[:lower:]]will match only lower-case letters
[[:upper:]]will match only upper case letters
List filenames matching a pattern
When run in the
moleculesdirectory, whichlscommand(s) will produce this output?
ethane.pdb methane.pdb
ls *t*ane.pdbls *t?ne.*ls *t??ne.pdbls ethane.*Solution
The solution is
3.
1.shows all files whose names contain zero or more characters (*) followed by the lettert, then zero or more characters (*) followed byane.pdb. This givesethane.pdb methane.pdb octane.pdb pentane.pdb.
2.shows all files whose names start with zero or more characters (*) followed by the lettert, then a single character (?), thenne.followed by zero or more characters (*). This will give usoctane.pdbandpentane.pdbbut doesn’t match anything which ends inthane.pdb.
3.fixes the problems of option 2 by matching two characters (??) betweentandne. This is the solution.
4.only shows files starting withethane..
More on Wildcards
Sam has a directory containing calibration data, datasets, and descriptions of the datasets:
. ├── 2015-10-23-calibration.txt ├── 2015-10-23-dataset1.txt ├── 2015-10-23-dataset2.txt ├── 2015-10-23-dataset_overview.txt ├── 2015-10-26-calibration.txt ├── 2015-10-26-dataset1.txt ├── 2015-10-26-dataset2.txt ├── 2015-10-26-dataset_overview.txt ├── 2015-11-23-calibration.txt ├── 2015-11-23-dataset1.txt ├── 2015-11-23-dataset2.txt ├── 2015-11-23-dataset_overview.txt ├── backup │ ├── calibration │ └── datasets └── send_to_bob ├── all_datasets_created_on_a_23rd └── all_november_filesBefore heading off to another field trip, she wants to back up her data and send some datasets to her colleague Bob. Sam uses the following commands to get the job done:
$ cp *dataset* backup/datasets $ cp ____calibration____ backup/calibration $ cp 2015-____-____ send_to_bob/all_november_files/ $ cp ____ send_to_bob/all_datasets_created_on_a_23rd/Help Sam by filling in the blanks.
The resulting directory structure should look like this
. ├── 2015-10-23-calibration.txt ├── 2015-10-23-dataset1.txt ├── 2015-10-23-dataset2.txt ├── 2015-10-23-dataset_overview.txt ├── 2015-10-26-calibration.txt ├── 2015-10-26-dataset1.txt ├── 2015-10-26-dataset2.txt ├── 2015-10-26-dataset_overview.txt ├── 2015-11-23-calibration.txt ├── 2015-11-23-dataset1.txt ├── 2015-11-23-dataset2.txt ├── 2015-11-23-dataset_overview.txt ├── backup │ ├── calibration │ │ ├── 2015-10-23-calibration.txt │ │ ├── 2015-10-26-calibration.txt │ │ └── 2015-11-23-calibration.txt │ └── datasets │ ├── 2015-10-23-dataset1.txt │ ├── 2015-10-23-dataset2.txt │ ├── 2015-10-23-dataset_overview.txt │ ├── 2015-10-26-dataset1.txt │ ├── 2015-10-26-dataset2.txt │ ├── 2015-10-26-dataset_overview.txt │ ├── 2015-11-23-dataset1.txt │ ├── 2015-11-23-dataset2.txt │ └── 2015-11-23-dataset_overview.txt └── send_to_bob ├── all_datasets_created_on_a_23rd │ ├── 2015-10-23-dataset1.txt │ ├── 2015-10-23-dataset2.txt │ ├── 2015-10-23-dataset_overview.txt │ ├── 2015-11-23-dataset1.txt │ ├── 2015-11-23-dataset2.txt │ └── 2015-11-23-dataset_overview.txt └── all_november_files ├── 2015-11-23-calibration.txt ├── 2015-11-23-dataset1.txt ├── 2015-11-23-dataset2.txt └── 2015-11-23-dataset_overview.txtSolution
$ cp *calibration.txt backup/calibration $ cp 2015-11-* send_to_bob/all_november_files/ $ cp *-23-dataset* send_to_bob/all_datasets_created_on_a_23rd/
Organizing Directories and Files
Jamie is working on a project and she sees that her files aren’t very well organized:
$ ls -Fanalyzed/ fructose.dat raw/ sucrose.datThe
fructose.datandsucrose.datfiles contain output from her data analysis. What command(s) covered in this lesson does she need to run so that the commands below will produce the output shown?$ ls -Fanalyzed/ raw/$ ls analyzedfructose.dat sucrose.datSolution
mv *.dat analyzedJamie needs to move her files
fructose.datandsucrose.datto theanalyzeddirectory. The shell will expand *.dat to match all .dat files in the current directory. Themvcommand then moves the list of .dat files to the ‘analyzed’ directory.
Reproduce a folder structure
You’re starting a new experiment, and would like to duplicate the directory structure from your previous experiment so you can add new data.
Assume that the previous experiment is in a folder called ‘2016-05-18’, which contains a
datafolder that in turn contains folders namedrawandprocessedthat contain data files. The goal is to copy the folder structure of the2016-05-18-datafolder into a folder called2016-05-20so that your final directory structure looks like this:2016-05-20/ └── data ├── processed └── rawWhich of the following set of commands would achieve this objective? What would the other commands do?
$ mkdir 2016-05-20 $ mkdir 2016-05-20/data $ mkdir 2016-05-20/data/processed $ mkdir 2016-05-20/data/raw$ mkdir 2016-05-20 $ cd 2016-05-20 $ mkdir data $ cd data $ mkdir raw processed$ mkdir 2016-05-20/data/raw $ mkdir 2016-05-20/data/processed$ mkdir 2016-05-20 $ cd 2016-05-20 $ mkdir data $ mkdir raw processedSolution
The first two sets of commands achieve this objective. The first set uses relative paths to create the top level directory before the subdirectories.
The third set of commands will give an error because
mkdirwon’t create a subdirectory of a non-existant directory: the intermediate level folders must be created first.The final set of commands generates the ‘raw’ and ‘processed’ directories at the same level as the ‘data’ directory.
Key Points
cp old newcopies a file.
mkdir pathcreates a new directory.
mv old newmoves (renames) a file or directory.
rm pathremoves (deletes) a file.
*matches zero or more characters in a filename, so*.txtmatches all files ending in.txt.
?matches any single character in a filename, so?.txtmatchesa.txtbut notany.txt.Use of the Control key may be notated in many ways, including
Ctrl-X,Control-X, and^X.The shell does not have a trash bin: once something is deleted, it’s really gone. REALLY!
Most files’ names are
something.extension. The extension isn’t required, and doesn’t guarantee anything, but is normally used to indicate the type of data in the file.