#Using StackOverflow to determine growth of Android Developers
StackExchange is an uber-expansive data repository, maintaining all its data using relational databases -
- Hosts data for 101 Q&A sites,
- Supports 3.3 million users, and
- Is home to 6.1 million questions and 11.3 million answers!
StackOverflow - one of the 101 Q&A sites supported.
###For launching queries against StackExcahge:
https://data.stackexchange.com/stackoverflow/queries
This data can be downloaded as .csv file!
###Schema:
https://data.stackexchange.com/stackoverflow/query/new (on RHS)
###SQL Query for gathering data:
select a.Id as PostId, a.Tags as PostTags, d.Id as AskerId, d.DisplayName as AskerName,d.Reputation as AskerReputation,
(select count(*) from Badges where UserId = a.OwnerUserId) as NumberOfAskerBadges,
a.CreationDate as AskDate, e.Id as AnswererId, e.DisplayName as AnswererName, e.Reputation as AnswererReputation,
(select count(*) from Badges where UserId = b.OwnerUserId) as NumberOfAnswererBadges,
(select OwnerUserId from Posts where Id = a.AcceptedAnswerId) as BestAnswerById,
(select CreationDate from Posts where Id = a.AcceptedAnswerId) as BestAnswerGivenAt
from Posts a
inner join Posts b on b.ParentId=a.Id
inner join Users d on d.Id=a.OwnerUserId
inner join Users e on e.Id=b.OwnerUserId
where a.PostTypeId=1 and a.Tags like '%android%'
and a.CreationDate between '2012-10-01 00:00:00' and '2013-04-01 00:00:00'
order by a.Id,b.CreationDate
###Pain-point : Only 50000 rows of data can be fetched from dB at a go - need to manually accumulate data by altering time-frames.
##Data-Visualization: An excellent tool for building graphical representations : Gephi
For tool to generate graphical representations, it needs data in a certain format (.net).
###How to generate your own representation?
- python code.py list_of_all_.csv_files (space separated, please!)
- Start Gephi
- Import the .net file (generated by Py script). -- Play around with different network parameters to understand how network is structured, how it has evolved with time, significant contributions, etc.
- Can be used for building network structures for different technologies, and could be used for conducting comparative analysis of their evolution.
###Tracking growth of nodes?
- python newNodes.py list_of_all_.csv_files (space separated, please!)
NewNodes.txt will contain the number of unique nodes in each .csv file.