Tuesday, August 18, 2015

Week 12

Since GSOC is finally wrapping up, I pretty much spent this week reading over code in the PR and writing documentation. I introduced a new section on table indexing in the docs after the "Table operations" section, which should give a good introduction to indexing functionality. It also links to an IPython notebook I wrote (http://nbviewer.ipython.org/github/mdmueller/astropy-notebooks/blob/master/table/indexing-profiling.ipynb) that displays some profiling results of indexing by comparing different scenarios, e.g. testing different engines and using regular columns vs. mixins. I also ran the asv benchmarking tool on features relevant to indexing, and fixed an issue with Table sorting in which performance was slowed down while sorting a primary index.

There's not much else to describe in terms of final changes, although I do worry about areas where index copying or relabeling come up unexpectedly and have a negative effect on performance. As an example, using the `loc` attribute on Table is very slow for an indexing engine like FastRBT (which is slow to copy), since the returned rows of the Table are retrieved via a slice that relabels indices. This is necessary if the user wants indices in the returned slice, but I doubt that's usually a real issue. I guess the two alternatives here are either to have `loc` return something else (like a non-indexed slice) or to simply advise in the documentation that using the index mode 'discard_on_copy' is appropriate in such a scenario.

No comments:

Post a Comment