[Community] Run time comparisons between C/GEOS and Python/Shapely

Yevgen Antymyrov yevgen.antymyrov at and.com
Fri Mar 26 19:03:56 EET 2010


Guys,

I've found this old conversation about Python vs C comparison and I have 
a question about it. If you don't mind, I'll continue the old thread.

My current task is to go through a big file, then for some particular 
features(F1) I need to find all features(F2) in another file which 
intersect the F1. For that I used shape to speed up the search of F2 by 
using layer.SetSpatialFilterRect().

My question does not have anything to do with Shapely, only with 
OGR/Python binding. But as you mentioned in this email, "The Python 
script above also uses a great deal of memory since it never frees 
memory allocated by OGR's GetNextFeature()". So it means that you had 
the experience with it. I understand that you meant not the 
"list.append()" problem, but something with OGR's internal work. Do you 
know how this could be avoided? Because my files are just gigabytes of 
data and it won't fit into my 32bit RAM.

What is see is that my script that basically does "for feature in 
get_features_generator():" just grows in memory even though I never 
store any data, only doing 'printf' when an intersection is found.

> >/ #=======================================
> />/ import os,sys,osgeo.ogr
> />/ from shapely.wkb import loads
> />/ from shapely.geometry import Polygon
> />/ 
> />/ source = osgeo.ogr.Open(sys.argv[1])
> />/ layer = source.GetLayer()
> />/ objet = layer.GetNextFeature()
> />/ liste = []
> />/ while objet:
> />/     liste.append(loads(objet.GetGeometryRef().ExportToWkb()))
> />/     objet = layer.GetNextFeature()
> />/ print len(liste)
> />/ 
> />/ 
> />/ for i in range(len(liste)-1):
> />/     ref = liste[i]
> />/     if i % 100 == 0:
> />/         print  str(i)
> />/         os.system("date")
> />/     for j in range(i+1,len(liste)):
> />/         sec = liste[j]
> />/         if ref.disjoint(sec) == False:
> />/             print "intersection entre " + str(i) + ' et ' + str(j)
> />/ 
> />/ 
> /
> Hi Pascal,
>
> A Python solution shouldn't be 100 times slower than C, especially since
> both OGR and Shapely use the same GEOS library to compute relationships
> between geometries. Is it possible that your other solution uses
> "intersects" instead of "not disjoint" (could be faster), or is more
> approximate?
>
> The Python script above also uses a great deal of memory since it never
> frees memory allocated by OGR's GetNextFeature(), and then copies the
> feature to a Shapely geom (and a GEOS geom). If your computer's free
> memory is low, performance can be poor. Is this a possibility?


-- 
Yevgen



More information about the Community mailing list