Authors: Mathabo Malange, Phangoxolo Sishuba
3.5. Data Input/Output#
Python has several functions for creating, reading, updating, and deleting
files. In this section we will look at input/output to and from text
and CSV
files to netCDF files. A built-in function for working with files in Python is
the open()
function. The open()
function takes two arguments: filename
and
mode
. There are four different modes (methods) for opening a file:
“r”: Read - Opens a file for reading, error if the file does not exist
“a”: Append - Opens a file for appending, creates if file does not exist
“w”: Write - Opens a file for writing, creates if file does not exist
“x”: Create - Creates a file, returns an error if the file exists
Besides the above file opening operations, the file can be opened in either text or binary mode.
“t” - Text - Default value. Text mode
“b” - Binary - Binary mode (e.g. images)
3.5.1. Text File Read#
To open a text file for reading it suffice to pass the filename to the open()
function:
>>> fid = open("filename.txt")
By default the file is open in read and text mode so that the “r” and “t” options can be omitted. The above is equivalent to
>>> fid = open("filename.txt", "rt")
It is important to note that the “filename.txt” is visible to the Python
interpreter from where the operation open()
is being performed. What does this
mean? It means, the “filename.txt” indicates a full path to where the file is
located in the disk. Suppose the file is in the Windows documents folder. In
that case we can specify the full path as follows:
>>> fid = open(r"C:\Users\username\Documents\filename.txt", "rt")
Note the “r” before the file path. This is to tell Python to treat the file path as a raw string. What does this imply? Under ordinary circumstances, the Windows file path separator is treated as an escape character in Python meaning that we would get a syntax error after ‘C:’. In this case we used the “r” at the beginning of the string or we could add a second escape character ‘’ to the path so that only one is treated as a path separator as follows.
>>> fid = open(r"C:\\Users\\username\\Documents\\filename.txt", "rt")
Escape characters are used for special operations such as specifying a newline
\n
, a tab \t
, a single quote \'
, a carriage return \r
, etc.
The open()
operation returns an object ‘fid’ which we name file id, and which
has a .read()
method for reading the content of the file.
>>> fid = open("filename.txt", "rt")
>>> print(fid.read())
While the .read()
method returns the whole text, we can pass an integer to
this method to read only a given number of text characters in the file such as
.read(10)
. Besides the read() method, there is a .readline()
method which
returns one line of text per call print(fid.readline())
. By calling the same
method twice, two lines of text are returned. Finally, we can loop through the
contents of the file and process each line at the time.
>>> fid = open("filename.txt", "rt")
>>> for line in fid:
>>> print(line)
Once we are done processing the contents of the file, we should always close the
opened file with the .close()
method. Python has a context manager (with
)
which guarantees the file is properly closed even in the case when an error that
interrupts the normal process occurs.
>>> fid = open("filename.txt", "rt")
>>> print(fid.read())
>>> fid.close()
or
>>> with open("filename.txt", "rt") as fid:
>>> print(fid.read())
3.5.2. Text File Write#
To write a text file in Python, the file needs to be opened in write mode using
the open()
function, which in this case will take: the file name and "w"
for
write mode:
>>> fid = open("filename.txt", "w")
Once the file is open, the .write()
method can be used to write content to the
file. By passing a string as an argument to the .write()
method, the content
will be written to the file. Remember, to write multiple lines, the newline
character \n
can be used as a separator.
>>> fid.write("Hello, world!\n")
>>> fid.write("This is an example file.")
After writing the text to be included in the file, it is important to close the
file using the .close()
method or use the context manager introduced above.
This step ensures that any buffered data is written to the file and frees up
system resources.
3.5.3. netCDF File Read#
NetCDF files are commonly used in various scientific domains, including
meteorology, climate science, oceanography, atmospheric sciences, and
geophysics. They are used to store complex multidimensional data. In Python, you
can use libraries such as netCDF4
or xarray
, which provide convenient
interfaces for reading, writing, and manipulating netCDF
data.
To use the netCDF4
library to read files in Python, the first step is to
import the library:
>>> import netCDF4 as nc
nc
here serves as alias to the netCDF4
library. Remember to ensure that you
have the netCDF4
library installed. You can install it using the pip package
manager by running pip install netCDF4
. The netCDF4
library contains a
function called Dataset()
which is used to open netCDF
files. To open a
netCDF
file, we pass the file path to Dataset()
.
>>> filepath = "C:\\Users\\username\\Documents\\Data\\file.nc"
>>> dataset = nc.Dataset(filepath)
The netCDF4 library also has functions that allow you to access the variables
and attributes within the file once the NetCDF file is open. The variables
attribute provides a dictionary-like object that contains all the variables, and
the ncattrs()
method returns a list of attributes associated with the NetCDF
file. You can use these variables and attributes to retrieve information and
data from the NetCDF file:
>>> variables = dataset.variables
>>> attributes = dataset.ncattrs()
Once the data has been retrieved from the variable, you can perform any
necessary operations or analysis on it using Python’s array manipulation and
data processing libraries such as NumPy
, pandas
or SciPy
.
Similar to working with a text file, it is good practice to close the netCDF
file once you have finished reading or modifying its contents. Use the
.close()
method of the dataset object created above:
>>> dataset.close()