Utils¶
This chapter documents the Utils. Functions and plots to aid in exploratory analysis
Analysis¶
One off functions for various analysis.
- first_top5_bottom_stats(doc_filter, col_lst):
Calculate mu, std, var, max, min, skew, kurt for all matches depending on teamPlacement. The intent is for a map_choice and mode_choice to be fed into the DocumentFilter. Does calculations for all matches, regardless of matchID.
- Parameters
doc_filter (DocumentFilter) – Input DocumentFilter.
col_lst (List[str] or str) – Input List of Columns to analyze.
- Returns
Stats, related to the items in col_lst, for winners, top 5 or 10, and bottom.
- Return type
pd.DataFrame
- Example
None
- Note
If Rebirth is selected in the DocumentFilter, will return top 5. If Verdansk, top 10 is returned.
- bucket_stats(doc_filter, placement, col_lst):
Calculate mu, std, var, max, min, skew, kurt for all matches depending on teamPlacement. The intent is for a map_choice and mode_choice to be fed into the DocumentFilter. Does calculations for all matches, considering of matchID.
- Parameters
doc_filter (DocumentFilter) – Input DocumentFilter.
placement (List[int] or int) – Target placement.
col_lst (List[str] or str) – Input List of Columns to analyze.
- Returns
Stats, related to the items in col_lst, for placement value.
- Return type
pd.DataFrame
- Example
None
- Note
teamPlacement value used to filter data. If two int’s are provided, will filter within that range. First value should be the lower value. Example [0,6] will return top 5 placements.
- previous_next_placement(doc_filter):
Calculate mu teamPlacement before and after a teamPlacement. The intent is for a map_choice and mode_choice to be fed into the DocumentFilter.
- Parameters
doc_filter (DocumentFilter) – Input DocumentFilter.
- Returns
Previous and next expected placement based on current placement.
- Return type
pd.DataFrame
- Example
None
- Note
None
- match_difficulty(our_doc_filter, other_doc_filter, mu_lst, sum_lst, test):
Calculate the relative match difficulty based on player and player squad stats.
- Parameters
our_doc_filter (DocumentFilter) – A DocumentFilter with squad and player data only.
other_doc_filter (DocumentFilter) – A DocumentFilter with all other players data.
mu_lst (List[str]) – A list of columns to consider the mu. Optional
sum_lst (List[str]) – A list of columns to consider the sum. Optional
test (bool) – If True, will use all columns for the analysis. Optional
- Returns
Match difficulty.
- Return type
pd.DataFrame
- Example
None
- Note
The intent is for a map_choice and mode_choice to be fed into both DocumentFilter’s.
- get_daily_hourly_weekday_stats(doc_filter):
Calculate kills, deaths, wins, top 5s or 10s, match count, and averagePlacement for every day, week, hour.
- Parameters
doc_filter (DocumentFilter) – Input DocumentFilter.
- Returns
3 pd.DataFrames and a dict
- Return type
None
- Example
None
- Note
The intent is for a map_choice and mode_choice to be fed into the DocumentFilter.
- get_weapons(doc_filter):
Calculate the Kills, deaths, assists, headshots, averagePlacement and count for each weapon.
- Parameters
doc_filter (DocumentFilter) – Input DocumentFilter.
- Returns
A DataFrame with a players gun stats.
- Return type
pd.DataFrame
- Example
None
- Note
The intent is for a username to be fed into the DocumentFilter and this will return the information for that specific player.
- find_hackers(doc_filter, y_column, col_lst, std):
Calculate hackers based on various Outlier detection methods.
- Parameters
doc_filter (DocumentFilter) – A DocumentFilter.
y_column (str) – A column to consider for Outlier analysis.
col_lst (List[str]) – A list of columns used for Outlier analysis.
std (int) – The std to be considered for as a threshold, default is 3. Optional
- Returns
Returns an index of suspected hackers.
- Return type
List[int]
- Example
None
- Note
The intent is for a map_choice and mode_choice to be fed into the DocumentFilter.
- meta_weapons(doc_filter, top_5_or_10, top_1, col, mu):
Calculate the most popular weapons. Map_choice is required in DocumentFilter if top_5_or_10 or top_1 is True. If Neither top_5_or_10 or top_1 are True, it will calculate based on all team placements. This will only include loadouts where all attachment slots are filled. This calculates based on a daily interval.
- Parameters
doc_filter (DocumentFilter) – A DocumentFilter.
top_5_or_10 (bool) – If True, will calculate using only the top 5 or 10 place teams, default is False. Optional
top_1 (bool) – If True, will calculate using only the 1st place or winning team, default is False. Optional
col (str) – If given will use a column as reference, default is None. None will count gun users per day. Optional
mu (bool) – If True, will calculate using mean, default is sum. Optional
- Returns
The First DataFrame is filled with dict’s {kills: 0, deaths: 0, count: 0}. The Second is the percent of the lobby using.
- Return type
List[pd.DataFrame]
- Example
None
- Note
None
Base¶
General transformations.
- normalize(arr, multi):
Normalize an Array.
- Parameters
arr (np.ndarray) – Input array.
multi (bool) – If array has multiple columns, default is None. Optional
- Returns
Normalized array.
- Return type
np.ndarray
- Example
None
- Note
Set multi to True, if multiple columns.
- running_mean(arr, num):
Calculate the running mean on num interval
- Parameters
arr (np.ndarray) – Input array.
num (int) – Input int, default is 50. Optional
- Returns
Running mean for a given array.
- Return type
np.ndarray
- Example
None
- Note
None
- cumulative_mean(arr):
Calculate the cumulative mean.
- Parameters
arr (np.ndarray) – Input array.
- Returns
Cumulative mean for a given array.
- Return type
np.ndarray
- Example
None
- Note
None
Outlier¶
Various outlier detection functions.
- stack(x_arr, y_arr, multi):
Stacks x_arr and y_arr.
- Parameters
x_arr (np.ndarray) – An array to stack.
y_arr (np.ndarray) – An array to stack.
mutli – If True, will stack based on multiple x_arr columns, default is False. Optional
- Returns
Array with a x column and a y column
- Return type
np.ndarray
- Example
None
- Note
None
- _cent(x_lst, y_lst):
Calculate Centroid from x and y value(s).
- Parameters
x_lst (List[float]) – A list of values.
y_lst (List[float]) – A list of values.
- Returns
A list of x and y values representing the centriod of two lists.
- Return type
List[float]
- Example
None
- Note
None
- _dis(cent1, cent2):
Calculate Distance between two centroids.
- Parameters
cent1 (List[float]) – An x, y coordinate representing a centroid.
cent2 – An x, y coordinate representing a centroid.
- Returns
A distance measurement.
- Return type
float
- Example
None
- Note
None
- outlier_std(arr, data, y_column, _std, plus):
Calculate Outliers using a simple std value.
- Parameters
arr (np.ndarray) – An Array to get data from. Optional
data (pd.DataFrame) – A DataFrame to get data from. Optional
y_column (str) – A target column. Optional
_std (int) – A std threshold, default is 3. Optional
plus (bool) – If True, will grab all values above the threshold, default is True. Optional
- Returns
An array of indexes.
- Return type
np.ndarray
- Example
None
- Note
If arr not passed, data and respective column names are required.
- outlier_var(arr, data, y_column, per, plus):
Calculate Outliers using a simple var value.
- Parameters
arr (np.ndarray) – An Array to get data from. Optional
data (pd.DataFrame) – A DataFrame to get data from. Optional
y_column (str) – A target column. Optional
per (float) – A percent threshold, default is 0.95. Optional
plus (bool, default is True) – If True, will grab all values above the threshold. Optional
- Returns
An array of indexes.
- Return type
np.ndarray
- Example
None
- Note
If arr not passed, data and respective column names are required.
- outlier_regression(arr, data, x_column, y_column, _std, plus):
Calculate Outliers using regression.
- Parameters
arr (np.ndarray) – An Array to get data from. Optional
data (pd.DataFrame) – A DataFrame to get data from. Optional
x_column (str) – A column for x variables. Optional
y_column (str) – A column for y variables. Optional
_std (int) – A std threshold, default is 3. Optional
plus (bool) – If True, will grab all values above the threshold, default is True. Optional
- Returns
An array of indexes.
- Return type
np.ndarray
- Example
None
- Note
If arr not passed, data and respective column names are required.
- outlier_distance(arr, data, x_column, y_column, _std, plus):
Calculate Outliers using distance measurements.
- Parameters
arr (np.ndarray) – An Array to get data from. Optional
x_column (str) – A column for x variables. Optional
y_column (str) – A column for y variables. Optional
_std (int) – A std threshold, default is 3. Optional
plus (bool) – If True, will grab all values above the threshold, default is True. Optional
- Param
data: A DataFrame to get data from. Optional
- Returns
An array of indexes.
- Return type
np.ndarray
- Example
None
- Note
If arr not passed, data and respective column names are required.
- outlier_hist(arr, data, x_column, per, plus):
Calculate Outliers using Histogram.
- Parameters
arr (np.ndarray) – An Array to get data from. Optional
x_column (str) – A column for x variables. Optional
per (float) – A std threshold, default is 3. Optional
plus (bool) – If True, will grab all values above the threshold, default is 0.75. Optional
- Param
data: A DataFrame to get data from. Optional
- Returns
An array of indexes.
- Return type
np.ndarray
- Example
None
- Note
If arr not passed, data and respective column names are required.
- outlier_knn(arr, data, x_column, y_column, _std, plus):
Calculate Outliers using KNN.
- Parameters
arr (np.ndarray) – An Array to get data from. Optional
x_column (str) – A column for x variables. Optional
y_column (str) – A column for y variables. Optional
_std (int) – A std threshold, default is 3. Optional
plus (bool) – If True, will grab all values above the threshold, default is True. Optional
- Param
data: A DataFrame to get data from. Optional
- Returns
An array of indexes.
- Return type
np.ndarray
- Example
None
- Note
If arr not passed, data and respective column names are required.
- outlier_cooks_distance(arr, data, x_column, y_column, plus, return_df):
Calculate Outliers using Cooks Distance.
- Parameters
arr (np.ndarray) – An Array to get data from. Optional
data (pd.DataFrame) – A DataFrame to get data from. Optional
x_column (str) – A column for x variables. Optional
y_column (str) – A column for y variables. Optional
_std (int) – A std threshold, default is 3. Optional
plus (bool) – If True, will grab all values above the threshold, default is True. Optional
return_df (bool) – If True, will return a DataFrame, default is False. Optional
- Returns
An array of indexes.
- Return type
np.ndarray or pd.DataFrame
- Example
None
- Note
If arr not passed, data and respective column names are required.
Plots¶
Various one off plots.
- personal_plot(doc_filter):
Returns a series of plots.
- Parameters
doc_filter (DocumentFilter) – A DocumentFilter.
- Returns
None
- Example
None
- Note
This is intended to be used with map_choice, mode_choice and a Gamertag inputted into the DocumentFilter.
- lobby_plot(doc_filter):
Returns a series of plots.
- Parameters
doc_filter (DocumentFilter) – A DocumentFilter.
- Returns
None
- Example
None
- Note
This is intended to be used with map_choice and mode_choice inputted into the DocumentFilter.
- squad_plot(doc_filter, col_lst):
Build a Polar plot for visualizing squad stats.
- Parameters
doc_filter (DocumentFilter) – A DocumentFilter.
col_lst (List[str] or str) – Input List of Columns to analyze.
- Returns
None
- Example
None
- Note
This is intended to be used with map_choice and mode_choice inputted into the DocumentFilter.
Scrape¶
Functions for getting and dealing with new data.
- connect_to_api(_id: str):
Connect to Call of Duty API.
- Parameters
_id (str) – A matchID str.
- Returns
A Json of lobby data related to specified matchID.
- Return type
Json
- Example
None
- Note
Connect to Cod API to receive lobby information.
- clean_api_data(json_object):
Cleans the JSON output from connect_to_api
- Parameters
json_object (Json) – Json object.
- Returns
Match information in a table.
- Return type
pd.DataFrame
- Example
None
- Note
Takes a Json object related to a matchID and constructs a pd.DataFrame with all relevant information. This will need to be saved(or concatenated to an existing csv) and loaded through the _evaulate_df() to work properly in this model.