Here's what it looks like.
I finished my playlist recommendation app! First, a user can search for an artist they have in mind. When they click the search button, the app makes an ajax call to the Spotify API and returns the artists with the search query as their name and lists them all sorted by popularity. You can then click on the correct one which then adds their name to the list on the left. After you put in a few artists (you can put in up to 10) and click the Find Playlist button, my app looks through all of the playlists in the database but skips ones that only have less than 3 unique artists. It then calculates the similarity score and stores the playlists that have the good scores. It will then show you the five playlists that had the highest scores. You can click on the play button (I made that too!) and play a 30 second preview to see if you actually like the songs or not. Because in the database, I stored each song's url given in the Spotify's track object, it came in handy when I made this play button. It's actually using a Javascript object called Audio. I basically create a new Audio object using the constructor and each track's preview url and when you call .play() on it, it plays! Here's what it looks like.
0 Comments
The code for the recommender ended up being pretty simple. You loop through the playlist that the user wants to match to and loop through each playlist in the database. For now I have about 80,000 tracks in 2600 playlists. I basically added each user's username (got them from people that I'm friends with on fb) and made an AJAX call to get each playlist and download them into files every 0.2 seconds because the Spotify API limits the calls to 10 calls per second. If I get an error during the looping it quits and it's a pain in the butt so just to be sure I multiplied the amount of time to sleep between calls. After a couple hours of struggling with it randomly quitting (or my access token expiring) I downloaded all the info I need. Then I passed them into my sqlite database in Rails which took another hour. I then used the code I wrote to calculate the Jaccard similarity coefficient to test how it predicts using my own playlist. I first started by looking at the song names. I looked at the playlists that had high Jaccard scores (over 0.02). This caused some problems. Some predictions were okay, but others were way off. It also kept on recommending playlists that only had one artist (like their album), or it would recommend playlists where I didn't know any of the songs. This was because I used the song names and some song names have multiple artists. I wasn't satisfied with the results so I decided to change it to look at the artists instead. I predicted that this would at least eliminate the playlists that only had one artist because intersections and unions don't include duplicates. This was great! Not only did I get more hits but my predictions were correct about eliminating albums and about half of the playlists that came up had songs I always listened to. I think I'll stick with this method and go back and fiddle with it later when I have time (like normalizations and creating clusters). One problem I have with this right now is that it takes about 2 minutes to go through all 80,000 tracks. From the user's side, this is not good at all. I would have to somehow create a visual that shows the percentage done (which I can probably implement by keeping a counter), change my database to PostgreSQL instead of sqlite (sqlite is slower), make one database query (Track.all) instead of one for each playlist (which means I would have to save all the data in a hash then loop through that) or try to rewrite my algorithm to get a smaller big O. Considering the fact that I only have 3 more days to finish everything, I'll get back to that later. I have to finish the front end first! There are some common ways to calculate the similarity score. Using the Pearson Correlation and Euclidean Distance are pretty popular. I could try to fit my data into using those but it just isn't the best way because for my music data, I'm looking at if the user has common songs to the playlists or not. To put it into numerical values it'll be binary 0 or 1. These calculation metrics are fit for when comparing say movie ratings from 0-5. I need a different method.
After some Googling, I decided that using the Log-Likelihood metric will be my best bet. What this does is it quantifies how unlikely or likely it is that an overlap between two datasets is due to chance. We will be comparing two likelihoods and looking at their ratio. We can create a table to observe four different situations. 1. The likelihood of both event A and B happening together (k11) 2. The likelihood of only event A happening and not B (k12) 3. The likelihood of only event B happening and not A (k21) 4. The likelihood of neither A nor B happening (k22) These can be used to calculate the log-likelihood ratio. LLR = 2sum(k)(H(k) - H(rowSums(k) - H(colSums(k)) where H is the Shannon's entropy => sum of (k_ij/sum(k)) log((k_ij / sum(k)) But after some researching, I thought I should start out with something a bit simpler. The Jaccard similarity coefficient looks pretty easy to implement. Not to mention, it is similar to the log-likelihood method in the way that it uses 4 situations as well to calculate the intersect and the union. This also gives a score of 0~1 (1 being most similar) which is perfect. All I have to do to get the Jaccard similarity coefficient is to divide the amount of intersections by the union. Sources I read through: 1. https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/similarity/package-summary.html 2. http://mail-archives.apache.org/mod_mbox/mahout-user/201105.mbox/%[email protected]%3E 3. http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html4. http://www.slideshare.net/NYCPredictiveAnalytics/building-a-recommendation-engine-an-example-of-a-product-recommendation-engine 5. http://www.slideshare.net/MrChrisJohnson/algorithmic-music-recommendations-at-spotify?related=1&utm_campaign=related&utm_medium=1&utm_source=3 6. http://en.wikipedia.org/wiki/Jaccard_index 6:30 pm Friday This past weekend I participated in startup weekend in Austin. The theme was education, so we were required to make something that would solve a problem in the education field. Friday night was when everyone who had ideas all pitched for a minute each in front of the whole group. Lindsey and I voted for and chose to join Jake, who is a 16 year old high school student (and the only kid under 18 at the event!). He wanted to make an app that both teachers and students use in the classroom to help teachers gauge how well the kids are understanding the material and for students to be able to anonymously give feedback to the teacher during the lecture. Jake said at his high school students are provided with ipads and are allowed to use them. His idea was feasible and the most appealing. We ended up having 6 people on our team. Lidiya and Drew were in charge of the business model, Lindsey and Justin were in charge of front-end and Jake was our captain because it was his idea. After we all chose our teams, we stuck around to discuss the idea and how the app was going to look like. We wanted something very simple so that it does not distract the students during class. We also wanted the feedback to come up in real-time and color-coded so that teachers can have a computer screen or ipad propped up and have visuals during the lecture. We settled on a few designs and went home around 11 to prepare for the next day. 9:00am Saturday We all got back to Capital Factory for another day. Breakfast tacos were served and I started brainstorming about the app. I started with a rough sketch of all the views and constructed all of the tables that will go in the database. I decided not to use any authentication because I only had a day and a half to finish this app. A teacher has an ID, and a username. A student has an ID and a username and can belong to many lectures. A Lecture belongs to a teacher and has many students. A feedback has an ID, a lecture ID, a student ID, a feedback type, and a timestamp. Organizing this perfectly is the most important part because when you start making models in rails and migrating, it might do some wonky stuff if you try to edit them later. Then I started coding. I decided to use Rails because 1. I wanted to get more practice with it and 2. it was probably the most time efficient way. I broke off from my team and was wired in from when it started all the way up until 5:30am with a few minute breaks for lunch and dinner and then I took a nap for 2 hours. I was pretty surprised that I didn't lose my mind or start losing focus but the fact that I was the only backend person on the team really made me want to get it up and running to not let anyone down. I'm also surprised that I didn't burn out during my 20 hour sprint. It was probably because I was having so much fun doing it. It's a good sign for when I have to code for 40+ hours a week right? I slept until 7:30 and then got ready for our last day. 9:00am Sunday We all gathered back at Capital Factory for our last day. I knew I didn't have time to try to learn and integrate Node.js so for the real-time part, I decided to autorefresh.... you're cringing I know, but I still had to integrate all of the foundation and CSS code that Lindsey wrote into the application and it was solely for the demo. Definitely not a scalable way. By 1:00pm I had to finish everything up so that I could pass this onto Justin for deployment. And it works! Check it out here. I also put a demo under the pictures. Although we didn't win, this was a great experience! I got a ton of free food (and a sweatshirt) :D. I also met a lot of amazing people that participated (mostly educators that were passionate about education!) and got great advice from mentors. I also made a functioning app in 30 hours! It was probably one of the most fun weekends I've ever experienced and I'll definitely do this agai Next week we start our first final project. Basically at MakerSquare, we are supposed to work on 2 final projects with 1 week given to us for each. Then on the third week, we can go back to one of them and polish it up for another week. For my first final project I'm thinking of making a custom generated playlist recommendation app. First I need to have seed data for the computer to use to figure out what songs to recommend. Because my app will solely depend on other people's playlists I need a bunch of playlist data. I looked into the Spotify API (<- great documentation) and it seems like I'll be able to use it. Not sure I want to use authentication because that'll take up a good chunk of time but this is great because I can have access to playlists that are public with track information in JSON format. Just what I need! Then I can convert them and feed them into a database. I also looked into the 8tracks API and the Echonest API but Spotify seems like it will be the best fit for what I want to make. If I were to use the Spotify API, I would need to learn how to use Node.js. Node is a runtime environment that has a built-in asynchronous I/O library. This is good for HTTP communication especially if I'm working with a music player because it allows for non-blocking requests (because each response doesn't have to wait to be called for the previous one to end, it makes everything faster!). CodeSchool has really great tutorials and exercises so I started doing them last night but I need to use callback functions which I don't like haha.
I also started doing some research on the algorithm I need to write for the backend. I had a simple scoring system in my mind but that may not be the best. The itunes Genius algorithm is a secret but this article How itunes Genius works gives a rough idea on how it was made. It makes use of the Term-frequency Inverse-document-frequency concept. I also started reading some of the proposed algorithms from the Netflix grand prize. This was an open source hack event that happened a while back in which Netflix proposed to give a million dollars to the team that can create a better rating prediction algorithm than what they had. Some concepts in these essays may come in handy but I don't know if I want to spend time reading through it all. The BigChaos Solution to the Netflix Grand Prize The BellKor Solution to the Netflix Grand Prize The Pragmatic Theory solution to the Netflix Grand Prize Way also showed me an awesome book called Collective Intelligence (<--- super interesting book!!!! go read it) which will definitely be super helpful. It's focused on machine learning and artificial intelligence. Chapters 2 and 3 talk about exactly what I need to do to find users with similar taste. These form tag helpers are awesome. creates the html Also, nested resources are helpful when you want to use resources within resources.
I've been thinking about a new app idea, and it has to do with matching or simple human behavior models. One potential idea is a gamble app in which each user will have multiple trials and gamble with a set amount of money. Each time the app will change up the condition (such as how much the player can win and the percent chance) and the computer will learn the behavior each time and predict what the next outcome will be. At the end it will show how much the user is risk averse or risk loving and how accurate the predictions were. I think it's a simple way to dive into machine learning and modeling human decision making which will be kind of cool. Another potential idea will be predicting if a song will be liked by a user or not. This will kind of be like tinder for music but each time the computer will suggest a song that is more likely to be liked by the user. Wouldn't it be cooler if Tinder actually did that, like started to learn what kind of people you like (say if someone liked people that wear glasses and are skinny) and gave you suggestions based on your type? Anyway the algorithm for that will be super simplified but solely based on user preferences and not song category or any other predetermined factor. Anyway, I thought it would be necessary to do some reading so I found a pretty cool scientific article called Human Matching Behavior in Social Networks: An Algorithm Perspective which is worth a read
JavaScript closures..... the first description I read about them made no sense.
"Because an inner function might live past the call to the outer function(s) in which it was created, the outer local variables which the inner function accesses must survive with the inner function. This is called closure: the local variables of the call(s) to the outer function(s) get retained by the inner function." huh? I did some googling and kind of figured it out. if we had an example: function Person (pName) { var _name = pName; this.getName = function(){ return _name; }; } and we call: var me = new Person("Rui") me # => Person {getName: function} me.getName() # => "Rui" Basically, a closure is a function that has a pointer reference to a (free) variable which is something that gets deleted (falls out of scope) after the parent function has returned or executed. In this example, because _name is declared with var inside of the Person constructor, it is not accessible by outside of that function and only exists while it is running. BUT, if that outer function still has a way to reference the free var (through another function usually) it can still persist. By using a closure, we can still access the inner variable! (the purple part is the closure) A lot happened this weekend. Cohort 6 graduated on Friday and Katrina left the devhouse :( Their goodbye party was fun though, and I really wish every one from that cohort is successful at finding jobs! Harsh (one of the co-founders of MakerSquare) took a selfie with some cohort 6 people + the instructors + me and alex :D On Saturday and Sunday, I mostly worked on Robin.js which is a JavaScript framework we are creating. MakerSquare's theory is that if you know how these frameworks were built (like Backbone.js) it'll make it easier to use. JavaScript isn't object oriented so it's pretty confusing. To understand constructors and the prototype model, I watched this video (below) 6 times and learned it inside out. It's a bit long, but if you compare it to an OOP language like Ruby it'll start to make more sense. I took notes that compares the classical model and prototypal model if you're too lazy to watch the video. It's certainly worth watching though. |