Woohoo, I’ve survived Week 1! 😀
(Background information about the project is at my About page)
No real coding this week, but a lot of API research, querying, and testing – we were trying to identify the best strategy/APIs to find Commons categories within a certain radius of specified GPS coordinates. Basically, Phase 1 of the project will involve using the chosen strategy to match an uploaded picture’s GPS coordinates with Commons categories that are found nearby, in order to display category suggestions to the user. Currently the only suggestions displayed to the user are ‘previously used categories’, so the goal for Phase 1 is to provide more suggestions that are tailored to the locality in which the picture was taken.
We were initially considering 3 strategies/APIs:
- The WikiData API that was suggested by Magnus on Nicolas (co-mentor)’s StackOverflow thread. Unrelated note: I later found out via Stephen (mentor) that Magnus has made a whole lot of other contributions to Wikimedia, and curiosity led me to his wikipedia page, which stated that he was one of those who worked on the initial software that all of the wikis are now based on. Oooh. Small world. 🙂
- The Commons API
- The “search for existing pics at that location” strategy, code-named Method C. Instead of querying directly for Commons categories, this strategy searches for existing pictures on the Commons database that have GPS coordinates close to the uploaded picture’s coordinates, and retrieves their categories. Pretty nifty.
I eventually found out that the Commons API was not suitable, so we distilled the comparison down to WikiData vs Method C. In both of those cases there were, fortunately, external tools to aid in running queries on the API quickly and easily for testing purposes – TABernacle for WikiData and the Commons MediaWiki API Sandbox for Method C. I used 10 sample pictures and categorized them manually (by taking the current categories given by the Commons community, and adding any of my own that I saw fit), then ran them through WikiData and Method C to see how many pertinent categories were picked up. All of the results are on the GitHub wiki.
Long story short, the WikiData API worked out fine – not many false positives and not many good categories suggested, but still a huge improvement to the current app’s suggestions. But Method C performed even better, retrieving more good categories despite finding more false positives. As Nicolas mentioned, though, a false positive isn’t as big a deal as lack of good category suggestions, because if the good category wasn’t present in the suggestion list, the user would have to try and guess the right category and potentially scroll through dozens of other ‘false positives’ in the alphabetical-ordered list.
So, we’ll probably go with Method C (or its successor Method D, which involves manipulating the radius), but that will likely be decided during our next weekly meeting. The last meeting we had was incredibly productive – brainstorming with Stephen and Nicolas was a huge help in overcoming the “oh no, this doesn’t work! what to do next??” block that I encountered halfway through.
The next task involves finding a library to extract the GPS coordinates of uploaded pictures, and implementing it. Finding the library itself shouldn’t be too hard, I reckon, but I’m still a little nervous about the implementation. This app is a fair bit more complex than any other app I’ve tinkered with so far, and while I’ve already made some modifications to it for the microtasks required for my application, they were mostly simple activity/UI tweaks or bugfixes – none of them involved actually getting down and dirty with the actual uploading/categorization code. I’ve spent quite a bit of time over the past few weeks looking at the existing upload/categorization code and trying to understand it, but I don’t think I’m anywhere close to 100% on that just yet.