Google Summer of Code: Week 9

This week I spent quite a bit of time on mixin column support for indices, where appropriate. After first moving the indices themselves from a Column attribute to a DataInfo attribute (accessed as col.info.indices), I moved most of the indexing code for dealing with column access/modifications to BaseColumnInfo for mixin use in methods like __getitem__ and __setitem__. Since each mixin class has to include proper calls to indexing code, mixins should set a boolean value _supports_indices to True in their info classes (e.g. QuantityInfo). As of now, Quantity and Time support indices, while SkyCoord does not since there is no natural order on coordinate values. I've updated the indexing testing suite to deal with the new mixins.

Aside from mixins (and general PR improvements like bug fixes), I implemented my mentors' suggestion to turn the previous static_indices context manager into a context manager called index_mode, which takes an argument indicating one of three modes to set for the index engine. These modes are currently:

'freeze', indicating that table indices should not be updated upon modification, such as updating values or adding rows. After 'freeze' mode is lifted, each index updates itself based on column values. This mode should come in useful if users intend to perform a large number of column updates at a time.
'discard_on_copy', indicating that indices should not be copied upon creation of a new column (for example, due to calls like "table[2:5]" or "Table(table)").
'copy_on_getitem', indicating that indices should be copied when columns are sliced directly. This mode is motivated by the fact that BaseColumn does not override the __getitem__ method of its parent class (numpy.ndarray) for performance reasons, and so the method BaseColumn.get_item(item) must be used to copy indices upon slicing. When in 'copy_on_getitem' mode, BaseColumn.__getitem__ will copy indices at the expense of a reasonably large performance hit. One issue I ran into while implementing this mode is that, for special methods like __getitem__, new-style Python classes call the type's method rather than the instance's method; that is, "col[[1, 3]]" corresponds to something like "type(col).__getitem__(col, [1, 3])" rather than "col.__getitem__([1, 3])". I got around this by adjusting the actual __getitem__ method of BaseColumn in this context (and only for the duration of the context), but this has the side effect that all columns have changed behavior, not just the columns of the table supplied to index_mode. I'll have to ask my mentors whether they see this as much of an issue, because as far as I can tell there's no other solution.

At this point I see the PR as pretty much done, although I'll spend more time writing documentation (and making docstrings conform to the numpy docstring standard).

Google Summer of Code

Tuesday, July 28, 2015

Week 9

No comments:

Post a Comment