Assignment 1

 
## Q1.  Draw a plot depicting the superiority of a method in terms of Average quality and number of changes in the quality (average value for all video samples and network profile). ***Each buffer configuration should have a separate single plot.***

Q1.¶

Draw a plot depicting the superiority of a method in terms of Average quality and number of changes in the quality (average value for all video samples and network profile). Each buffer configuration should have a separate single plot.

In [13]:

​x
 
import pandas as pdimport matplotlib.pyplot as plt​data = pd.read_csv('./results.csv')# show file contentdata.head()

Out[13]:

	profile	sample	method	quality	change	inefficiency	stall	numStall	avgStall	overflow	numOverflow	qoe	bufSize
0	p1	v1	Method1	1784.40	8	0.312697	0.714	1	0.714	0.000	0	512299.0	240
1	p1	v1	Method1	2009.71	11	0.327201	0.000	0	0.000	48.781	17	578385.0	120
2	p1	v1	Method1	2616.02	95	0.270815	0.000	0	0.000	55.229	92	622398.0	30/60
3	p1	v1	Method2	2189.01	25	0.518530	0.000	0	0.000	0.000	0	606435.0	240
4	p1	v1	Method2	2878.43	14	0.401456	0.000	0	0.000	0.000	0	823815.0	120

 
1. We have three different buffer configurations, therefore here we have 3 seperate plots- To check the superiority of a method, below we represent each point(quality and change) in a scatter plot.2. Under different network configurations, we can get the average of change time and quality for all samples first. 

We have three different buffer configurations, therefore here we have 3 seperate plots
To check the superiority of a method, below we represent each point(quality and change) in a scatter plot.
Under different network configurations, we can get the average of change time and quality for all samples first.

In [14]:

 
import itertoolstest_240 = data[data.bufSize == '240']marker = itertools.cycle(('o', 'v', '^', '<', '>', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X'))fig, ax = plt.subplots(figsize=(6,4))for name, group in test_240.groupby(['method']):    plt.scatter(group.change.mean(), group.quality.mean(), marker=next(marker),label=name)    plt.xlabel('change')    plt.ylabel('quality')    plt.ylim(0,2500)    plt.legend(bbox_to_anchor=(1.0, 1.05))#     plt.legend(loc='upper right')    plt.title('Supriority of a method')​

 
- For the same buffer size for all the network profile and sample, we calculate the average value of the change number and quality.- When the buffer size is 240, we can see Method 3 has the lowest value of average change, and Method 5 possesses the highest quality.

For the same buffer size for all the network profile and sample, we calculate the average value of the change number and quality.
When the buffer size is 240, we can see Method 3 has the lowest value of average change, and Method 5 possesses the highest quality.

In [15]:

 
import itertoolstest_120 = data[data.bufSize == '120']​marker = itertools.cycle(('o', 'v', '^', '<', '>', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X'))fig, ax = plt.subplots(figsize=(6,4))for name, group in test_120.groupby(['method']):    plt.scatter(group.change.mean(), group.quality.mean(), marker=next(marker),label=name)    plt.xlabel('change')    plt.ylabel('quality')    plt.ylim(0,2500)    plt.legend(bbox_to_anchor=(1.0, 1.05))#     plt.legend(loc='upper right')​

 
- For the same buffer size for all the network profile and sample, we calculate the average value of the change number and quality.- When the buffer size is 120, we can see the change for Method 3 is the lowest, while Method 5 has the highest quality over the others, and Method 1 is the worst among all methods.

For the same buffer size for all the network profile and sample, we calculate the average value of the change number and quality.
When the buffer size is 120, we can see the change for Method 3 is the lowest, while Method 5 has the highest quality over the others, and Method 1 is the worst among all methods.

In [16]:

 
import itertoolstest_30_60 = data[data.bufSize == '30/60']​marker = itertools.cycle(('o', 'v', '^', '<', '>', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X'))fig, ax = plt.subplots(figsize=(6,4))for name, group in test_30_60.groupby(['method']):    plt.scatter(group.change.median(), group.quality.median(), marker=next(marker),label=name)    plt.xlabel('change')    plt.ylabel('quality')    plt.ylim(0,2500)    plt.legend(bbox_to_anchor=(1.0, 1.05))

 
- For the same buffer size for all the network profile and sample, we calculate the average value of the change number and quality.- When the buffer size is 30/60, we can see the average of change number for Method 3 is superior over other methods, and in terms of quality, Method 5, 6, 9 are superior over others.

For the same buffer size for all the network profile and sample, we calculate the average value of the change number and quality.
When the buffer size is 30/60, we can see the average of change number for Method 3 is superior over other methods, and in terms of quality, Method 5, 6, 9 are superior over others.

 
**In summary, combine the three separate plots into a subplot sharing the same y-axis, we can notice `Method 3` is better than other methods.**

In summary, combine the three separate plots into a subplot sharing the same y-axis, we can notice Method 3 is better than other methods.

In [17]:

 
import itertoolsdef plot_group(ax, df):    marker = itertools.cycle(('o', 'v', '^', '<', '>', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X'))    cycol = itertools.cycle('bgrcmk')    for key, group in df.groupby(['method']):        group.plot(ax=ax, kind='scatter', x='change', y='quality',label=key,  marker=next(marker), c=next(cycol),s=50)        ax.set_ylim(0, 2500)        ax.legend(bbox_to_anchor=(1.1, 1.05))​grouped = data.groupby(['bufSize','method'])['change', 'quality'].mean().reset_index()buffer_config=['240','120','30/60']​fig,ax=plt.subplots(3, sharey=True, figsize=(15,9))​for index, value in enumerate(buffer_config):    slice_t = ax[index]    df=grouped[grouped.bufSize == value]    plot_group(slice_t, df)​plt.show()​

 
## Q2.Draw a single plot for all buffer configurations showing QoE (average value for all video samples and network profile) for all methods. You are expected to show the comparison of the QoE of a method in all buffer configurations as well as the comparison of all methods using the single plot.  

Q2.¶

Draw a single plot for all buffer configurations showing QoE (average value for all video samples and network profile) for all methods. You are expected to show the comparison of the QoE of a method in all buffer configurations as well as the comparison of all methods using the single plot.

In [18]:

 
def autolabel(rects, ax):    # Get y-axis height to calculate label position from.    (y_bottom, y_top) = ax.get_ylim()    y_height = y_top - y_bottom​    for rect in rects:        height = rect.get_height()        # Fraction of axis height taken up by this rectangle        p_height = (height / y_height)​        # If we can fit the label above the column, do that;        # otherwise, put it inside the column.        if p_height > 0.95: # arbitrary; 95% looked good to me.            label_position = height - (y_height * 0.05)        else:            label_position = height + (y_height * 0.01)​        ax.text(rect.get_x() + rect.get_width()/2., label_position,                '%d' % int(height), color = 'dimgrey',rotation=45, fontsize=8,                ha='center', va='bottom')ax = data.groupby(['method','bufSize']).median()['qoe'].unstack().plot(kind='barh', stacked=True, figsize=(15,5), width=0.6)# ax.set_yscale("log")rects = ax.patches# autolabel(rects, ax)​plt.grid()plt.title("Average Qoe for All Buffer Configurations")plt.setp(plt.gca().get_xticklabels(), rotation=0, horizontalalignment='center')​plt.show()

 
- Bar chart is very intuitive when comparing different categories, and since there are different buffer configurations under the ten methods, stacked bar chart is applied for better demonstration. - In terms of average QoE for all video samples and network profile, we plot the whole 10 methods under three different buffer configurations.- Under 30/60 and 120 buffer configuration, we can see Method 5 has the highest QoE compared to others, while when the configuration buffer is 240, Method 10 is better even then Method 5.

Bar chart is very intuitive when comparing different categories, and since there are different buffer configurations under the ten methods, stacked bar chart is applied for better demonstration.
In terms of average QoE for all video samples and network profile, we plot the whole 10 methods under three different buffer configurations.
Under 30/60 and 120 buffer configuration, we can see Method 5 has the highest QoE compared to others, while when the configuration buffer is 240, Method 10 is better even then Method 5.

 
## Q3.Draw plots to show the correlation between inefficiency and quality for all methods in all buffer configurations. 

Q3.¶

Draw plots to show the correlation between inefficiency and quality for all methods in all buffer configurations.

In [19]:

 
plt.scatter(data.inefficiency, data.quality)plt.xlabel('inefficiency')plt.ylabel('quality')plt.title('correlation between inefficiency and quality')plt.show()​

 
- The relationship between inefficiency and quality is shown in the above plot, while there seems a weak connection between the two attributes. We can observe there is a cluster when inefficiency is between 0.2 and 0.4 which means quality value is under a relative stable range. To test whether it's true under different buffer congigurations, we use `seaborn` to segment the points, the results is show in below figures.

The relationship between inefficiency and quality is shown in the above plot, while there seems a weak connection between the two attributes. We can observe there is a cluster when inefficiency is between 0.2 and 0.4 which means quality value is under a relative stable range. To test whether it's true under different buffer congigurations, we use seaborn to segment the points, the results is show in below figures.

In [20]:

 
import seaborn as snsq3=data[['inefficiency','quality','bufSize']]sns.set(style="ticks", color_codes=True)g = sns.pairplot(q3, diag_kind="kde",  markers="+",plot_kws=dict(s=50, edgecolor="b", linewidth=1),diag_kws=dict(shade=True))

In [21]:

 
g = sns.pairplot(q3, hue="bufSize",palette="husl")

 
## Q4.We would like to know the methods which have the minimum number of stalls for video V7 under all network profiles. Draw appropriate plot for it.

Q4.¶

We would like to know the methods which have the minimum number of stalls for video V7 under all network profiles. Draw appropriate plot for it.

In [22]:

 
q4 = data[data['sample']=='v7'][['stall','method']]sum_stall = q4.groupby('method').sum().reset_index()ax = sum_stall.plot(kind='bar', x='method',figsize=(12,5),color = 'g')ax.set(xlabel='method', ylabel='stalls')plt.setp(plt.gca().get_xticklabels(), rotation=0, horizontalalignment='center')plt.show()

 
- An intuitive way to compare stalls under different methods is to sum them up, so we can see most of the methods have zero stalls regardless of network conditions.​If we want to take a closes look at the distributions of all stalls, box and violin chart are both good ways to illustrate the points.

An intuitive way to compare stalls under different methods is to sum them up, so we can see most of the methods have zero stalls regardless of network conditions.

If we want to take a closes look at the distributions of all stalls, box and violin chart are both good ways to illustrate the points.

In [23]:

 
test_stalls = data[data['sample']=='v7'][['stall','method']]ax = test_stalls.boxplot(column='stall',by=['method'],figsize=[15,7])ax.set_ylim(-1, 4)ax.xaxis.set_ticks_position('bottom')ax.spines['bottom'].set_position(('data', -1))

 
- First we extract all the V7 sample regardless of network conditions, then plot all stalls in all the methods, we can discover that Method 2,5,6,7,8,9 and 10 all have the minimum stalls which is 0.- To show that stalls, here we use the boxplots for demonstration, and since some of the methods have zero stalls, we move the x-axis down to -1 to represent this situation. 

First we extract all the V7 sample regardless of network conditions, then plot all stalls in all the methods, we can discover that Method 2,5,6,7,8,9 and 10 all have the minimum stalls which is 0.
To show that stalls, here we use the boxplots for demonstration, and since some of the methods have zero stalls, we move the x-axis down to -1 to represent this situation.

In [24]:

 
# plotsns.set_style('ticks')fig, ax = plt.subplots()# the size of A4 paperfig.set_size_inches(12,5)sns.violinplot(data=q4,x="method", y="stall", inner="points", ax=ax)    sns.despine()