This week I implemented nice bit of functionality that Tom suggested, inspired by a similar feature in Pandas: retrieving index information via Table attributes `loc` and `iloc`. The idea is to provide a mechanism for row retrieval in between a high-level `query()` method and dealing with Index objects directly. Here's an example:
```
In [2]: t = simple_table(10)
In [3]: print t
a b c
--- ---- ---
1 1.0 c
2 2.0 d
3 3.0 e
4 4.0 f
5 5.0 g
6 6.0 h
7 7.0 i
8 8.0 j
9 9.0 k
10 10.0 l
In [4]: t.add_index('a')
In [5]: t.add_index('b')
In [6]: t.loc[4:9] # 'a' is the implicit primary key
Out[6]:
<Table length=6>
a b c
int32 float32 str1
----- ------- ----
4 4.0 f
5 5.0 g
6 6.0 h
7 7.0 i
8 8.0 j
9 9.0 k
In [7]: t.loc['b', 1.5:7.0]
Out[7]:
<Table length=6>
a b c
int32 float32 str1
----- ------- ----
2 2.0 d
3 3.0 e
4 4.0 f
5 5.0 g
6 6.0 h
7 7.0 i
In [8]: t.iloc[2:4]
Out[8]:
<Table length=2>
a b c
int32 float32 str1
----- ------- ----
3 3.0 e
4 4.0 f
```
The `loc` attribute is used for retrieval by column value, while `iloc` is used for retrieval by position in the sorted order of an index. This involves the designation of a primary key, which for now is just the first index added to the table. Also, indices can now be retrieved by column name(s):
```
In [9]: t.indices['b']
Out[9]:
b rows
---- ----
1.0 0
2.0 1
3.0 2
4.0 3
5.0 4
6.0 5
7.0 6
8.0 7
9.0 8
10.0 9
```
Aside from this, I've been adding in miscellaneous changes to the PR, such as getting `np.lexsort` to work with Time objects, reworking the `SortedArray` class to use a `Table` object instead of a list of ndarrays (for working with mixins), putting `index_mode` in `Table`, etc. Tom noted some performance issues when working with indices, which I've been working on as well.
No comments:
Post a Comment