Monday, August 10, 2015

Week 11

This week I implemented nice bit of functionality that Tom suggested, inspired by a similar feature in Pandas: retrieving index information via Table attributes `loc` and `iloc`. The idea is to provide a mechanism for row retrieval in between a high-level `query()` method and dealing with Index objects directly. Here's an example:
```
In [2]: t = simple_table(10)

In [3]: print t
 a   b    c
--- ---- ---
  1  1.0   c
  2  2.0   d
  3  3.0   e
  4  4.0   f
  5  5.0   g
  6  6.0   h
  7  7.0   i
  8  8.0   j
  9  9.0   k
 10 10.0   l

In [4]: t.add_index('a')

In [5]: t.add_index('b')

In [6]: t.loc[4:9] # 'a' is the implicit primary key
Out[6]:
<Table length=6>
  a      b     c
int32 float32 str1
----- ------- ----
    4     4.0    f
    5     5.0    g
    6     6.0    h
    7     7.0    i
    8     8.0    j
    9     9.0    k

In [7]: t.loc['b', 1.5:7.0]
Out[7]:
<Table length=6>
  a      b     c
int32 float32 str1
----- ------- ----
    2     2.0    d
    3     3.0    e
    4     4.0    f
    5     5.0    g
    6     6.0    h
    7     7.0    i

In [8]: t.iloc[2:4]
Out[8]:
<Table length=2>
  a      b     c
int32 float32 str1
----- ------- ----
    3     3.0    e
    4     4.0    f
```
The `loc` attribute is used for retrieval by column value, while `iloc` is used for retrieval by position in the sorted order of an index. This involves the designation of a primary key, which for now is just the first index added to the table. Also, indices can now be retrieved by column name(s):
```
In [9]: t.indices['b']
Out[9]:
 b   rows
---- ----
 1.0    0
 2.0    1
 3.0    2
 4.0    3
 5.0    4
 6.0    5
 7.0    6
 8.0    7
 9.0    8
10.0    9
```
Aside from this, I've been adding in miscellaneous changes to the PR, such as getting `np.lexsort` to work with Time objects, reworking the `SortedArray` class to use a `Table` object instead of a list of ndarrays (for working with mixins), putting `index_mode` in `Table`, etc. Tom noted some performance issues when working with indices, which I've been working on as well.

No comments:

Post a Comment