Category: Excel

Sep 24 2010

Fellow Netflix customer – need help deciding your next Instant Watch? Excel to the rescue. And how I did it.

I am a movie fanatic.  I think movies are serious art and movie-watching is my number one hobby. 

So Netflix is a good fit for me.  And our family watches far more movies using Instant Watch than we do with Blue-Ray and DVD disks.

But in my opinion Netflix has one major weakness.  IT does VERY LITTLE to help you SELECT your next movie.  And I am indecisive which create in me a a little perfect storm of frustration when I am in the mood for a flic.  No kidding, I have spend half the length of a typical movie just trying to FIND the right movie.  

I like to consider a lot of things when I want pic a flix…

  • What GENRE am I in the mood for or is appropriate for the audience?
  • Given the audience (frequently my wife and/or family) what is the MPAA RATING of the movie? 
  • What is the AVERAGE USER RATING of the movie?  (Help me find one that generally doesn’t suck.)

If you want to sort or filter using these criteria it is very difficult. 

To solve this problem I created an Excel spreadsheet to help.  It list the top 30 movies by GENRE sorted by AVERAGERATING in descending order.   Using this you could use the filtering capability of Microsoft Excel to find any movie you want by by Genre, MPAA Rating, and/or User Rating.  

I created an Office 2007 version and an Office 2003 version.  Here they are:

Office 2007 version.

Office 2003 version.

NOW THE TECHNICAL STUFF

To create this spreadsheet I used a FREE awesome querying tool called LinqPad.  You can get it at http://www.linqpad.net/.  After downloading (notice you don’t have to install it…tt is an executable you can put on your desktop.) Put the following C# Linq statements into a new query. 


var genre = "Thrillers";  
var minRating = 3.0;  
var topNum = 30;  

var query =  
    (from g in Genres  
    from t in g.Titles  
    where g.Name==genre  
    && t.Instant.Available==true  
    && t.AverageRating >= minRating   
    orderby t.AverageRating descending   
    select new {t.Name, 
        t.Rating, 
        t.AverageRating, 
        t.ShortSynopsis, 
        Genre=genre}  
    ).Take(topNum);  
    
query.Dump(string.Format("Top {0} {1} Instant Watch Movies with a " +  
                    "{2:0.0} minimum average rating sorted " +  
                    "by average rating in descending order.",  
                    topNum,genre,minRating));  
                    

Be sure that the Language setting is set to “C# Statement(s)” and that the Database is set to “http://odata.netflix.com/Catalog/”.  That is the OData data source that Netflix provides.

Notice that the query can only return results for one Genre at a time.  (That is a limitation of the OData data source.  There are other Netflix Data source types that are not as limiting, but you can’t use those directly in LinqPad without writing a full program.  )  So I had to run the query once for each Genre and aggregate the results into a single spreadsheet.  It took me about an hour.  Keep in mind that the data in the spreadsheet will go stale (as movies are moved out of Instant Watch and new movies are moved in.)  With this LinqPad query you can refresh the data for the genres that interest you the most.

Enjoy.

Seth

Aug 26 2010

A little ASCII weirdness

Just did a little Excel “hallway” support.  The user had copied and pasted in some data from a web page to excel.  Column A “looked” like a row of numbers but there is nothing I could do to get excel to treat the values like numbers.  Formulas that were expecting numbers showed #VALUE errors.

It seemed that each cell had a SPACE after it.  And that is weird in its own right because in a cell with a number and a space Excel should treat the value like a number.  But sure enough I could delete the “space” from the value and the cell would then begin to act like a number.   Problem was NOT solved though, because the sheet had 4000 rows that needed cleaning up.

The weirdness continued when I tried to use the Excel REPLACE function to remove the spaces.  But Excel could not find a space in the column.  By this time I was weirded out and I asked the user to email the sheet so I could take a closer look.

Once back at my desk it only took a few minutes to solve.  I used the Excel VBA ASC function to tell me what the character was.  In the Excel VBA Editor’s Immediate pane I typed in

?Asc(Mid(Range(“A1”),6,1))

That showed me that the character was NOT a space but was ASCII code 160. 

To fix the user’s data I did CTRL-H (Replace dialog) and in the Find box I typed ALT-0160.  That put in a “space” and I was able to Replace All. 

Turns out that in HTML (remember the data was copied from a web page) that ASCII-160 is treated as a NON-BREAKING space. 

For the curious…here are a couple of links.

http://office.microsoft.com/en-us/excel-help/ascii-character-chart-HA001133136.aspx

http://www.ascii.cl/htmlcodes.htm

Seth

Apr 15 2010

Excel pivot tables do not support COUNT DISTINCT aggregation.

Just got confirmation from MR EXCEL that Excel does not support COUNT DISTINCT aggregation. HERE is my question...
Is there a way to do a count distinct aggregation on a pivot table without messing with the source data. For example. If I do a row labels of Customers and "customerProductsPurchased" like this... CUSTOMERA 10    SHIRT  5    SHORTS  5 CUSTOMERB    12    SHIRT    3    SHORTS    5    POLOSHIRT    4 This gives the total of all of each product purchased and the grand total on the customer header row. What I want is something like this... CUSTOMERA  2    SHIRT    SHORTS CUSTOMERB  3    SHIRT    SHORTS    POLOSHIRT Notice that this tells me the number of distinct products that I sell that the customer purchased. CUSTOMERA is a customer for 2 of my products and CUSTOMERB is a customer for 3 of my products. In the database world this is called a distinct count. Googling seems to indicate that this cannot be done in the pivot table without doing it first in the source. But I thought I would ask here to confirm.
Here is the reply I got...
No, you can't. You have to do it in the source data.
To read the whole thread (there's a bit more)... http://www.mrexcel.com/forum/showthread.php?t=461802 Seth