Table Of Contents

Previous topic

Table Sets

Next topic

Custom reading/writing

This Page

Masking and null values

It is often useful to be able to define missing or invalid values in a table. There are currently two ways to do this in ATpy, Null values, and Masking. The preferred way is to use Masking, but this requires at least NumPy 1.4.1 in most cases, and the latest svn version of NumPy for SQL database input/output. Therefore, for version 0.9.4 of ATpy, the default is to use the Null value method. To opt-in to using masked arrays, specify the masked=True argument when creating a Table instance:

>>> t = Table('example.fits.gz', masked=True)

In future, once NumPy 1.5.0 is out, we will switch over to using masked arrays by default, and will slowly phase out the Null value method.

If you want to set the default for masking to be on or off for a whole script, this can be done using the set_masked_default function:

import atpy
atpy.set_masked_default(True)

If you want to set the default for masking on a user-level, create a file named ~/.atpyrc in your home directory, containing:

[general]
masked_default:yes

The set_masked_default function overrides the .atpyrc file, and the masked= argument in Table overrides both the set_masked_default function and the .atpyrc file.

Null values

The basic idea behind this method is to specify a special value in each column that will signify missing or invalid data. To specify the Null value for a column, use the null argument in add_column:

>>> t.add_column('time', time, null=-999.)

Following this, if the table is written out to a file or database, this null value will be stored.

This method is generally unreliable, especially for floating point values, and does not allow users to easily distinguish between invalid and missing values.

Masking

NumPy supports masked arrays, where specific elements of an array can be properly masked by using a mask - a boolean array. There are several advantages to using this:

  • The mask is unrelated to the value in the cell - any cell can be masked, not just all cells with a specific value
  • It is possible to distinguish between invalid (e.g. NaN) and missing values
  • Values can easily be unmasked (although when writing to a file/database, the ‘old’ values are lost for masked elements).
  • NumPy provides masked versions of many functions, for example sum, mean, or median, which means that it is easy to correctly compute statistics on masked arrays, ignoring the masked values.

To specify the mask of a column, use the mask argument in add_column. To do the equivalent to the example in Null values, use:

>>> t.add_column('time', time, mask=time==-999.)

When writing out to certain file/database formats, a masked value has to be given a specific value - this is called a fill value. To set the fill value, simply use the fill argument when adding data to a column:

>>> t.add_column('time', time, mask=time==-999., fill=-999.)

In the above example, if the table is written out to an IPAC table, the value of -999. will be used for masked values.

Note

When implementing this in ATpy, we discovered a few bugs in the masked structured implementation of NumPy, which have now been fixed. Therefore, we recommend using the latest svn version of NumPy if you want to use masked arrays.