[Community] Run time comparisons between C/GEOS and Python/Shapely
Yevgen Antymyrov
yevgen.antymyrov at and.com
Fri Mar 26 19:03:56 EET 2010
Guys,
I've found this old conversation about Python vs C comparison and I have
a question about it. If you don't mind, I'll continue the old thread.
My current task is to go through a big file, then for some particular
features(F1) I need to find all features(F2) in another file which
intersect the F1. For that I used shape to speed up the search of F2 by
using layer.SetSpatialFilterRect().
My question does not have anything to do with Shapely, only with
OGR/Python binding. But as you mentioned in this email, "The Python
script above also uses a great deal of memory since it never frees
memory allocated by OGR's GetNextFeature()". So it means that you had
the experience with it. I understand that you meant not the
"list.append()" problem, but something with OGR's internal work. Do you
know how this could be avoided? Because my files are just gigabytes of
data and it won't fit into my 32bit RAM.
What is see is that my script that basically does "for feature in
get_features_generator():" just grows in memory even though I never
store any data, only doing 'printf' when an intersection is found.
> >/ #=======================================
> />/ import os,sys,osgeo.ogr
> />/ from shapely.wkb import loads
> />/ from shapely.geometry import Polygon
> />/
> />/ source = osgeo.ogr.Open(sys.argv[1])
> />/ layer = source.GetLayer()
> />/ objet = layer.GetNextFeature()
> />/ liste = []
> />/ while objet:
> />/ liste.append(loads(objet.GetGeometryRef().ExportToWkb()))
> />/ objet = layer.GetNextFeature()
> />/ print len(liste)
> />/
> />/
> />/ for i in range(len(liste)-1):
> />/ ref = liste[i]
> />/ if i % 100 == 0:
> />/ print str(i)
> />/ os.system("date")
> />/ for j in range(i+1,len(liste)):
> />/ sec = liste[j]
> />/ if ref.disjoint(sec) == False:
> />/ print "intersection entre " + str(i) + ' et ' + str(j)
> />/
> />/
> /
> Hi Pascal,
>
> A Python solution shouldn't be 100 times slower than C, especially since
> both OGR and Shapely use the same GEOS library to compute relationships
> between geometries. Is it possible that your other solution uses
> "intersects" instead of "not disjoint" (could be faster), or is more
> approximate?
>
> The Python script above also uses a great deal of memory since it never
> frees memory allocated by OGR's GetNextFeature(), and then copies the
> feature to a Shapely geom (and a GEOS geom). If your computer's free
> memory is low, performance can be poor. Is this a possibility?
--
Yevgen
More information about the Community
mailing list