Distributed numbercrunching?

Suggestions for WiGLE/JiGLE/DiGLE

4 posts • Page 1 of 1

Postby konaya » Sun Mar 10, 2013 9:15 pm

I'm very new to WIGLE (but not to wardriving in general).The first thing I've noticed about this place is that there are a lot of threads discussing the at times slow processing of new data. Is this because the current back-end system is frail/unstable/inefficient, or more because there are a lot of numbers which need to be crunched?

If the latter, I have an idea. I have no idea what kind of processing you're doing on the server end, but it seems to be either very intensive or very disproportionate to the available resources. The submitted data is already crowdsourced; why not crowdsource some processing cycles as well? Using BOINC or similar, we could all pitch in to tackle that pesky queue! Would something like this be possible, and if so, would the WIGLE addies be interested?

Postby uhtu » Sun Mar 10, 2013 11:32 pm

there is a good amount of crunchings, but the overwhelming majority of the processing process is I/O and contention limited.
we've not yet come up with an approach that beats the current centralized system (which is quite fast up until it isn't)
across all of the metrics that matter to us.

an embarassingly parallel system (say, a hilarious future world where each of the 2^48 MAC addresses has its own wigle instance) that we can't maintain is of no use to anyone.

-uhtu

Postby hknienh » Tue Mar 12, 2013 8:58 am

You mention "centralized system". Is the I/O-limitation something that could be alleviated by having centralized parallelization? After uploading data, I can see that the processing goes through a couple of stages (parsing, triangulating; I think there are more but I don't recall them). which suggests that at least those stages could be pipelined. Although the triangulation and actual database update might both pull data from the same database.

How is the database organized at the moment? I would expect a table that with observations (location+time) indexed by MAC and a table with MACs indexed by location, for generating maps. That would be easy to parallellize by having separate machines for MAC lookup and for different parts of the world.

Right now there is a queue of over 1000 and the 'total wifi' counter has been stuck at 88,220,257 for two days. The queue seems to be shrinking but the total counter isn't.
Proud contributor of the 50, 52, 54, 65, and 70 millionth wifi network.

Postby bobzilla » Tue Mar 12, 2013 4:56 pm

It is a very parallel problem, and we have many cpus and partitions that work on it in parallel, but when we also run large parallel batch background tasks at the same time performance suffers a bit. Examples include geocoding every network, or retrilaterating large geographic areas after an improvement to filtering algorithms.
-bobzilla - WiGLE.net just a little bit
Image

4 posts • Page 1 of 1

Return to “WiGLE Project Suggestions”

Who is online

Users browsing this forum: Google [Bot] and 35 guests