[Community] PCL-Referencing weirdness with numpy
Sean Gillies
sgillies at frii.com
Thu Aug 3 06:34:23 EEST 2006
On Aug 2, 2006, at 5:06 PM, Christopher Barker wrote:
> Hi all,
>
> Is this the right list to discuss PCL?
>
> I've been working with the PCL referencing module, and trying to
> figure out how to do what I need. In the process, I encountered an
> oddity if you try to pass numpy arrays into ProjTransform.transform.
>
> If you try to pass in an NX2 array of data, which would naturally
> map to a list of 2-tuples, you get what looks like an empty list as
> a result, but it doesn't act right: if you try:
>
> for p in ListofPoints:
> print p
>
> You get an error.
>
> If you pass in a list of numpy (2,) arrays, you get an answer back,
> but the numbers are wrong and you still get an error after
> iterating through the list.
>
> I now this is a case of:
>
> Patient: "Doctor, it hurts when I do this!"
> Doctor: "Then don't do that!"
>
> But:
>
> A) it would be great if PCL could understand numpy arrays.
> B) If not, you should get an exception, NOT weird results,
> particularly ones that look real.
>
Hi Chris,
I agree. Furthermore, if you tried to persuade me to switch the
implementation of PCL's geometries from a GEOS geom to a numpy array
of coordinates, I'd probably go for it!
> I've enclosed a test file that demonstrates the issues.
>
> I didn't dig deep, but I think I may have identified the problem.
> In _proj4_module.c, I see the code for: _proj4_transform_points
>
> In there, you use a PyObject for each of the arguments, and don't
> type check at all. Then, later on, the code uses:
>
> PyList_Size(arg3)
> and
> coords = PyList_GetItem(arg3, i)
>
> and then
> PyTuple_GetItem(coords, 0)
>
> Without checking that the coords is a tuple
>
> etc.
>
> Unfortunately, none of those calls do any type checking, so if you
> pass in anything other than a list of tuples of doubles, weird
> things can happen!
>
> Possible solutions:
>
>
> * Robust and easy:
> Use PyList_Check and friends to type check before using the input
> list. (and tuple, etc). You could also pass in format strings to
> PyArgParseTuple() to do at least some type checking for you.
>
> * More flexible:
> Use the PySequence* functions, so that the code will work on
> sequences other than lists of tuples (tuples of tuples, even numpy
> arrays). This is the most "pythonic" solution but can slow things
> down.
>
> wxPython does this, through it's wxPointListHelper functions -- it
> takes any sequence of sequences of numbers, and branches to get
> optimum performance for each common sequence type.
>
> You can use PySequence_Fast to help with that too.
>
> * Better:
> Check for numpy arrays, and use the array interface it you get
> them. In addition to one of the above.
I filed a ticket to address this. See http://trac.gispython.org/
projects/PCL/ticket/85.
>
> * Best ?
> Require numpy -- then you can use numpy's Turn-this-arbitrary-input-
> into-a-numpy-array function, and go from there with a known, robust
> C-array of data. Blazingly fast on numpy array inputs, and as fast
> as anything else on list, etc inputs, but you don't have to write
> all the type checking code--the numpy folks have done that.
>
> I call that the best because I think numpy arrays really are a
> natural for this kind of thing. large collections of coordinates,
> and certainly image data.
>
> In fact, if I start using PCL more, I'll probably donate some code
> for handling numpy arrays well. Whether that happens depends a lot
> of how the GDAL "next generation" python wrappers work out. I think
> they are working on building numpy support it.
Yes, like I said, I'm inclined to make as much use of numpy as will
help improve PCL. Our geometries have two main use cases:
1) Support all the 2D operations specified by the OGC simple features
access spec (union, intersection, etc). This is currently achieved by
using the GEOS library.
2) Support efficient rendering into conventional raster maps and
(future) conversion to GML and SVG. Currently this is done with
MapServer -- and GD -- but I'm getting ready to move on to Cairo.
That means PCL geometries must be efficiently turned into Cairo paths.
I used to think that 2 trumped 1, but I'm no longer certain of that.
>
> Another note: I think you've got a memory leak in that function too:
>
> in:
>
> fail:
> return NULL;
>
> you need to decref the PyObjects you've created, which I think is
> the results list.
>
> Also, shouldn't that set an exception and/or return PyNone or
> something?
The tests run leak-free, but maybe that just means that I don't yet
have adequate coverage.
>
> This is really, really hard code to write -- that's why I don't
> write C if I can help it! It's also why people use Pyrex, Swig,
> Boost::python, etc. In this case, I think pyrex may be a really
> good option.
>
> I hope this was helpful,
>
> -Chris
>
Very helpful! I disagree a little bit with your last statements. I
don't find Python's C API any more difficult to use and understand
than SWIG. I used to be very gung ho about SWIG, but now I think it's
just another unnecessary dependency.
I'm going to set you up with a login so that you can have access to
our issue tracker and wiki.
Cheers,
Sean
---
Sean Gillies
http://zcologia.com
More information about the Community
mailing list