Monday, May 25, 2015

GSOC 2015

This is my first blog entry for the 2015 Google Summer of Code--I'm excited to take part in the program for a second year, and to continue working with Astropy in particular! I just realized that Python students were expected to make a blog post about Community Bonding Period by May 24th, so this is just a bit late.

My project this year will involve the implementation of database indexing for the Table class in Astropy, which is a central Astropy data structure. I had a Google Hangout meeting with my mentors (Michael Droettboom and Tom Aldcroft) last week in which we discussed possibilities for functionality to implement over the summer. Among these are

  • Allowing for multiple indexed columns within a Table, where each "index" is a 1-to-n mapping of keys to values
  • Using indices to speed up existing Table operations, such as "join", "group_by", and "unique"
  • Implement other Table methods like "where", "indices", "sort"
  • A special index called the "primary key" which might yield algorithmic improvements
  • Making sure values in an index can be selected by range
We also noted some open questions to tackle as the project gets further along. My work on the project begins today; so far (during the community bonding period) I've been reading the Astropy docs on current Table operations and looking at Pandas to see how the indexed Series class works under the hood. I discovered that Pandas uses a Python wrapper around a Cython/C hash table engine to implement its Series index; in fact, the same engine is used in the Python ASCII parser I investigated last year (e.g. for a hashmap of NaN values). I haven't figured out yet which data structure is most appropriate for our purposes--candidates include a hash map, B-tree, or a bitmap for certain cases--but that's part of what I'll be looking into this week. I'll also write asv benchmarks for current Table operations and work on passing Table information into a (not yet written) indexing data structure.

No comments:

Post a Comment