Read xAPI statements stored in a csv file

The methods in this notebook implement the functionalities for reading a collection of xAPI statements stored in a csv file

The libraries used:

As an example, in this package we provide three csv files containing a few hundreds of xAPI statements.

csv_files = ['../example_statements_1.csv', '../example_statements_2.csv', '../example_statements_3.csv', 
            '../example_statements_4.csv']

Load statements from file

Let’s start by reading the csv file. This first example uses ; as a delimiter, but usually it’s a ,. We will shortly define a function that takes care of all the differences between xAPI statements datasets.

if Path(csv_files[0]).exists():
    statements = pd.read_csv(csv_files[0], index_col=None, delimiter=';').reset_index(drop=True) 
else:
    print("The specified file does not exist. Creating an empty DataFrame...")
    statements = pd.DataFrame()
statements.head()
timestamp lrs_id actor name verb id verb display object id object name result
0 2022-08-02T14:44:34.4429540Z 6148511b448b2d059a63e424 AAA430802 http://activitystrea.ms/schema/1.0/start {'en-us': 'started'} https://wekit-community.org/stepID=TS-ef86ff46... {'en-us': 'Action Step 3'} NaN
1 2022-08-02T14:44:34.0702380Z 6148511b448b2d059a63e424 AAA430802 http://id.tincanapi.com/verb/viewed {'en-us': 'viewed'} http://MirageXR_Image_133039249801494090.jpg NaN NaN
2 2022-08-02T14:44:34.0626310Z 6148511b448b2d059a63e424 AAA430802 http://activitystrea.ms/schema/1.0/listen {'en-us': 'listened_to'} http://characterinfo/TS-ef86ff46-70b6-472a-93d... NaN NaN
3 2022-08-02T14:44:34.0477250Z 6148511b448b2d059a63e424 AAA430802 https://wekit-community.org/verb/met {'en-us': 'met'} resources:// char:Woman_C {'en-us': 'char:Woman_C'} NaN
4 2022-08-02T14:43:28.0360420Z 6148511b448b2d059a63e424 AAA430802 http://activitystrea.ms/schema/1.0/start {'en-us': 'started'} https://wekit-community.org/stepID=TS-a5780c27... {'en-us': 'Action Step 2'} NaN

Ideally we want to have a more readable version of the content. We want to have three columns named actor, verb and object that contain the statement information in a readable format. We also would like to have the timestamp information as a datetime object, and we do not care about IDs, as they are usually random strings which are not needed in the data analysis. The function import_csv does all of this for us.


source

import_csv

 import_csv (csv_file:Union[str,pathlib.Path], index_col:int=0,
             delimiter:str=',', quotechar:str='"')

Reads a csv file and perform some processing to make the data easier to read as well as easier to process afterwards. Returns a pandas Dataframe

Type Default Details
csv_file typing.Union[str, pathlib.Path] Filename of the csv with the data
index_col int 0 The index column
delimiter str , the column delimiter
quotechar str Quoting char. Ignore delimiter between this character
Returns DataFrame The imported dataframe with all the xAPI statements

Let’s use the function we just defined to reload the dataframe and check that it works as expected:

statements = import_csv(csv_files[0], index_col=None, delimiter=';') 
statements.head()
timestamp actor result verb object
0 2022-08-02T14:44:34.4429540Z AAA430802 NaN started Action Step 3
1 2022-08-02T14:44:34.0702380Z AAA430802 NaN viewed None
2 2022-08-02T14:44:34.0626310Z AAA430802 NaN listened_to None
3 2022-08-02T14:44:34.0477250Z AAA430802 NaN met char:Woman_C
4 2022-08-02T14:43:28.0360420Z AAA430802 NaN started Action Step 2

Just to make sure, let’s repeat the process for the other files. In some cases the function can be called with slightly different arguments, depending on the specific format of the file. For example, for the next file we specify a quote character and use a different delimiter

statements = import_csv(csv_files[1], index_col=None, delimiter=',', quotechar='"') 
statements.head()
timestamp actor object description result verb object
0 2022-06-24T08:46:28.169Z 336078 {'en-US': 'Pause/Leave app'} {"completion":true} pause app Pause
1 2022-06-24T08:42:38.636Z 336078 {'en-US': 'Return to app'} {"completion":true} Return to app Return
2 2022-06-24T08:42:30.775Z 336078 {'en-US': 'Pause/Leave app'} {"completion":true} pause app Pause
3 2022-06-24T08:42:10.209Z 336078 {'en-US': 'Account 1s1115 logged in'} {"completion":true} Log In Access app
4 2022-06-23T09:14:19.815Z 370445 {'en-US': 'Pause/Leave app'} {"completion":true} pause app Pause

Another file has additional columns that can be interesting, such as language

statements = import_csv(csv_files[2], index_col=None, delimiter=',') 
statements.head()
timestamp actor result language verb object
0 2022-07-14T10:58:11.295Z B78BFDBA-9CA9-4787-B2D4-7BD43F042135 {"score":{"raw":0}} Romanian exit main_menu
1 2022-07-14T10:57:56.684Z B78BFDBA-9CA9-4787-B2D4-7BD43F042135 {"score":{"raw":0}} Romanian launched scene_game
2 2022-07-14T10:57:50.804Z B78BFDBA-9CA9-4787-B2D4-7BD43F042135 {"score":{"raw":0}} Romanian exit main_menu
3 2022-07-14T10:57:42.154Z B78BFDBA-9CA9-4787-B2D4-7BD43F042135 {"score":{"raw":0}} Romanian launched scene_maths_game
4 2022-07-14T10:57:28.866Z B78BFDBA-9CA9-4787-B2D4-7BD43F042135 {"score":{"raw":0}} Romanian launched scene_tests

Finally, some files (like the one we open now) have an index representing the number of statement in the first column, so we specify an index_col

statements = import_csv(csv_files[3], index_col=0, delimiter=',') 
statements.head()
timestamp stored actor verb object result
0 2023-03-10 11:45:09.638000+00:00 2023-03-10T11:45:09.638Z Teacher Logged In Salesianos NaN
1 2023-03-10 11:52:00.020000+00:00 2023-03-10T11:52:00.020Z PC006 Logged In Salesianos NaN
2 2023-03-10 11:52:04.063000+00:00 2023-03-10T11:52:04.063Z PC008 Logged In Salesianos NaN
3 2023-03-10 11:52:05.177000+00:00 2023-03-10T11:52:05.177Z Tablet1 Logged In Salesianos {"score":{"raw":0}}
4 2023-03-10 11:52:05.679000+00:00 2023-03-10T11:52:05.679Z PC004 Logged In Salesianos NaN

The three most important columns are actor, verb and object, which create a sentence-like structure. We can see the actions that the app registers from the verb column.


source

get_all_verbs

 get_all_verbs (df:pandas.core.frame.DataFrame)

Returns a set with all verbs in the dataset

Type Details
df DataFrame The dataset containing the xAPI statements (one statement per row)
Returns typing.Set Set containing all the verbs occurring in the dataset
test_verbs = {'Logged In', 'Placed', 'Swiped', 'Asked', 'Started', 'Logged Out',
       'Accepted', 'Set Turn', 'Suggested', 'Ran Out', 'Sent', 'Checked',
       'Assigned', 'Canceled', 'Ended'}
test_eq(get_all_verbs(statements), test_verbs)

We provide similar functions for actors and objects


source

get_all_actors

 get_all_actors (df:pandas.core.frame.DataFrame)

Returns a set with all actors in the dataset

Type Details
df DataFrame The dataset containing the xAPI statements (one statement per row)
Returns typing.Set Set containing all the actors occurring in the dataset
test_actors = {'Teacher', 'PC006', 'PC008', 'Tablet1', 'PC004', 'PC009', 'PC007', 'PC003', 'Iphone 1',
       'PC005', 'iPad2', 'Tablet 2', 'Android1', 'Android2', 'iPad1', 'PC002', 'Android4', 'Android3',
       'iphone 1', 'iPhone 1', 'Ipad1', 'Tablet1 ', 'Ipad2'}
test_eq(get_all_actors(statements), test_actors)

source

get_all_objects

 get_all_objects (df:pandas.core.frame.DataFrame)

Returns a set with all objects in the dataset

Type Details
df DataFrame The dataset containing the xAPI statements (one statement per row)
Returns typing.Set Set containing all the objects occurring in the dataset

The list of unique objects is quite big, so we will not print it in this example.

As the actor values are usually associated to a user input (for example the username provided when starting the app), it makes sense to clean the values as to avoid that User1, user1 and user 1 are trated as the same user. The following functions allow to do just that, on the desired columns.


source

remove_whitespaces

 remove_whitespaces (df:pandas.core.frame.DataFrame, cols:List)

Removes whitespaces from the specified columns in the dataframe.

Type Details
df DataFrame The dataset containing the xAPI statements (one statement per row)
cols typing.List the columns on which whitespaces should be removed
Returns DataFrame The dataframe after applying the function

source

to_lowercase

 to_lowercase (df:pandas.core.frame.DataFrame, cols:List)

Converts to lowercase the elements in the specified columns. The function only applies to columnns whose type is str

Type Details
df DataFrame The dataset containing the xAPI statements (one statement per row)
cols typing.List the columns whose content should be made lowercase
Returns DataFrame The dataframe after applying the function
test_actors = {'teacher', 'pc006', 'pc008', 'tablet1', 'pc004', 'pc009', 'pc007', 'pc003', 'iphone1',
               'pc005', 'ipad2', 'tablet2', 'android1', 'android2', 'ipad1', 'pc002', 'android4', 'android3'}
df = remove_whitespaces(statements, ["actor"])
df2 = to_lowercase(df, ["actor"])
test_eq(get_all_actors(df2), test_actors)

We may also be interested in removing specific rows from the dataset, for examples the ones associated to an actor that opted out of the intervention, or for verbs we do not care about. This could be the case for example for verbs like Log In or Log out, which provides information about when a user starts and stops the app, but may be not relevant in case our analysis is only about the interactions from within the app.


source

remove_actors

 remove_actors (df:pandas.core.frame.DataFrame, cols:List)

Removes from the dataframe all the rows whose actor is in the specified list

Type Details
df DataFrame The dataset containing the xAPI statements (one statement per row)
cols typing.List the list of actors to remove
Returns DataFrame The dataframe with the specified actors removed
statements = import_csv(csv_files[3], index_col=0, delimiter=',') 
test_actors = {'Teacher', 'PC006', 'PC008', 'Tablet1', 'PC004', 'PC009', 'PC007', 'PC003', 'Iphone 1',
       'PC005', 'iPad2', 'Tablet 2', 'Android1', 'Android2'}
test_df = remove_actors(statements, ['iPad1', 'PC002', 'Android4', 'Android3',
       'iphone 1', 'iPhone 1', 'Ipad1', 'Tablet1 ', 'Ipad2'])
test_eq(get_all_actors(test_df), test_actors)

source

remove_verbs

 remove_verbs (df:pandas.core.frame.DataFrame, cols:List)

Removes from the dataframe all the rows whose actor is in the specified list

Type Details
df DataFrame The dataset containing the xAPI statements (one statement per row)
cols typing.List the list of verbs to remove
Returns DataFrame The dataframe with the specified verbs removed
test_verbs = {'Placed', 'Swiped', 'Asked', 'Started', 'Accepted', 'Set Turn', 'Suggested', 'Ran Out',
              'Sent', 'Checked', 'Assigned', 'Canceled', 'Ended'}
test_df = remove_verbs(statements, ["Logged In", "Logged Out"])
test_eq(get_all_verbs(test_df), test_verbs)

xAPI statements analysis

Here we present some functions that are typically applied when analysing xAPI statements data. For this, we will use a clean version of the statements dataset, where some of the functions described above has been applied

statements = remove_whitespaces(statements, ["actor"])
statements = to_lowercase(statements, ["actor"])
statements = remove_verbs(statements, ["Logged In", "Logged Out"])
statements = remove_actors(statements, ["android3"])
statements.head(5)
timestamp stored actor verb object result
14 2023-03-10 11:52:18.277000+00:00 2023-03-10T11:52:18.277Z iphone1 Placed Earth {"score":{"raw":0}}
15 2023-03-10 11:52:18.847000+00:00 2023-03-10T11:52:18.847Z iphone1 Swiped Left {"score":{"raw":0}}
18 2023-03-10 11:52:29.001000+00:00 2023-03-10T11:52:29.001Z iphone1 Placed Earth {"score":{"raw":0}}
19 2023-03-10 11:52:29.094000+00:00 2023-03-10T11:52:29.094Z android2 Placed Earth {"score":{"raw":0}}
20 2023-03-10 11:52:29.194000+00:00 2023-03-10T11:52:29.194Z iphone1 Swiped Right {"score":{"raw":0}}

A typical check is to evaluate how many interactions are provided by each actor:


source

count_interactions

 count_interactions (df:pandas.core.frame.DataFrame)

Creates a new dataframe counting the total number of statements associated to each actor

Type Details
df DataFrame The dataset containing the xAPI statements (one statement per row)
Returns DataFrame A dataframe with the number of interactions of each actor

On our toy dataset, it looks like this:

interactions =  count_interactions(statements)
interactions
actor count
0 pc009 6
1 pc006 13
2 pc008 19
3 pc002 21
4 pc004 32
5 pc007 42
6 pc003 43
7 iphone1 86
8 ipad1 87
9 android4 106
10 teacher 112
11 android1 119
12 tablet1 133
13 ipad2 140
14 tablet2 145
15 android2 147

source

create_barplot

 create_barplot (df:pandas.core.frame.DataFrame, x:str, y:str,
                 cmap:str='flare')

Creates an horizontal barplot of the data in the dataframe

Type Default Details
df DataFrame The input dataset
x str the column with the numerical variable to be plotted
y str the column with the name associated to each value
cmap str flare the color palette to be used
create_barplot(interactions, 'count', 'actor')

We can also extract specific statements associated to just one actor and representing just one verb


source

subset_actor_verb

 subset_actor_verb (df:pandas.core.frame.DataFrame, actor:str, verb:str)

Returns the subset of the original dataframe containing only statements with the specified actor and verb

Type Details
df DataFrame The dataset containing the xAPI statements (one statement per row)
actor str The actor we are interested in
verb str The verb we are interested in
Returns DataFrame A dataframe containing only the statements with a specific actor and verb
subset = subset_actor_verb(statements, "teacher", "Assigned")
subset.head(5)
timestamp stored actor verb object result
316 2023-03-10 12:04:36.832000+00:00 2023-03-10T12:04:36.832Z teacher Assigned 7.72;iPhone_1 NaN
368 2023-03-10 12:05:37.368000+00:00 2023-03-10T12:05:37.368Z teacher Assigned 8.15;Android2 NaN
397 2023-03-10 12:06:24.752000+00:00 2023-03-10T12:06:24.752Z teacher Assigned 7.72;Tablet1 NaN
541 2023-03-10 12:11:20.420000+00:00 2023-03-10T12:11:20.420Z teacher Assigned 7.45;Tablet_2 NaN
582 2023-03-10 12:12:12.001000+00:00 2023-03-10T12:12:12.001Z teacher Assigned 7.72;iPad2 NaN

From the subset we could analyse the objects to detect if there are any interesting patterns. In the example, we could extract the values (one is a score, the other the actor to whom it was assigned)


source

split_column

 split_column (df:pandas.core.frame.DataFrame, col:str, col_names:List,
               sep:str=';')

Splits the column of the DataFrame into multiple columns, and return a new data

Type Default Details
df DataFrame The dataset containing the xAPI statements (one statement per row)
col str The column in the dataset that should be split into multiple columns
col_names typing.List The names of the columns created after split
sep str ; The separator between fiels inside the column we want to split
Returns DataFrame A dataframe with the content col cplit into several columns
grades = split_column(subset, 'object', ['score', 'student'])
grades.head(5)
score student
316 7.72 iPhone_1
368 8.15 Android2
397 7.72 Tablet1
541 7.45 Tablet_2
582 7.72 iPad2

source

average_interactions

 average_interactions (df:pandas.core.frame.DataFrame, avg_col:str,
                       user_col:str='actor')

Similar to count_interactions, but here creates a new dataframe averaging the statements associated to a specific column

Type Default Details
df DataFrame The dataset containing the xAPI statements (one statement per row)
avg_col str The column on which to compute average
user_col str actor The column to groupby (usually actor)
Returns DataFrame A new dataframe with the average of the interaction per specific value
grades["score"] = grades["score"].astype("float")
avg_grades = average_interactions(grades, 'score', 'student')
avg_grades
student score
2 Android4 2.726667
4 Ipad2 3.000000
6 Tablet1_ 4.000000
11 iphone_1 5.000000
10 iPhone_1 5.035000
0 Android1 6.125000
5 Tablet1 6.860000
1 Android2 7.380000
8 iPad1 7.500000
3 Ipad1 7.660000
9 iPad2 7.720000
7 Tablet_2 8.016667
create_barplot(avg_grades, 'score', 'student', cmap='mako')