Overview

Dataset statistics

Number of variables4
Number of observations800
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory31.2 KiB
Average record size in memory40.0 B

Variable types

Categorical3
Numeric1

Warnings

learningActivityTitle has a high cardinality: 184 distinct values High cardinality
learnerCom has a high cardinality: 81 distinct values High cardinality
learnerIntranetID is highly correlated with learnerComHigh correlation
learnerCom is highly correlated with learnerIntranetIDHigh correlation
duration has 35 (4.4%) zeros Zeros

Reproduction

Analysis started2021-05-18 10:37:16.216429
Analysis finished2021-05-18 10:37:18.985431
Duration2.77 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

learningActivityTitle
Categorical

HIGH CARDINALITY

Distinct184
Distinct (%)23.0%
Missing0
Missing (%)0.0%
Memory size12.5 KiB
CompTIA A+ 220-1001: Installing Hardware & Display Components
 
22
CompTIA A+ 220-1001: Basic Cable Types
 
21
CompTIA A+ 220-1001: Connectors
 
18
CompTIA A+ 220-1001: TCP & UDP ports
 
18
CompTIA A+ 220-1001: Implementing Network Concepts
 
17
Other values (179)
704 

Length

Max length103
Median length41
Mean length41.68125
Min length10

Characters and Unicode

Total characters33345
Distinct characters75
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique107 ?
Unique (%)13.4%

Sample

1st rowWorking with Data for Effective Decision Making
2nd rowPersonal Skills for Effective Business Analysis
3rd rowBusiness Analysis Overview
4th rowUsing Active Listening in Workplace Situations
5th rowClarity and Conciseness in Business Writing

Common Values

ValueCountFrequency (%)
CompTIA A+ 220-1001: Installing Hardware & Display Components22
 
2.8%
CompTIA A+ 220-1001: Basic Cable Types21
 
2.6%
CompTIA A+ 220-1001: Connectors18
 
2.2%
CompTIA A+ 220-1001: TCP & UDP ports18
 
2.2%
CompTIA A+ 220-1001: Implementing Network Concepts17
 
2.1%
CompTIA A+ 220-1001: Resolving Problems17
 
2.1%
CompTIA A+ 220-1001: Configuring a Wired/Wireless Network17
 
2.1%
CompTIA A+ 220-1001: Printers17
 
2.1%
CompTIA A+ 220-1001: Custom PC configuration16
 
2.0%
CompTIA A+ 220-1001: Troubleshooting16
 
2.0%
Other values (174)621
77.6%

Length

2021-05-18T16:07:19.328428image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
a331
 
7.2%
comptia275
 
6.0%
220-1001275
 
6.0%
data207
 
4.5%
175
 
3.8%
analysis143
 
3.1%
fundamentals121
 
2.6%
with104
 
2.3%
for80
 
1.7%
cybersecurity66
 
1.4%
Other values (366)2820
61.3%

Most occurring characters

ValueCountFrequency (%)
3797
 
11.4%
n2165
 
6.5%
e2073
 
6.2%
i2059
 
6.2%
a1850
 
5.5%
s1693
 
5.1%
t1602
 
4.8%
o1554
 
4.7%
r1347
 
4.0%
l877
 
2.6%
Other values (65)14328
43.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter21425
64.3%
Uppercase Letter4606
 
13.8%
Space Separator3797
 
11.4%
Decimal Number2007
 
6.0%
Other Punctuation562
 
1.7%
Dash Punctuation366
 
1.1%
Math Symbol323
 
1.0%
Open Punctuation129
 
0.4%
Close Punctuation129
 
0.4%
Other Symbol1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n2165
10.1%
e2073
 
9.7%
i2059
 
9.6%
a1850
 
8.6%
s1693
 
7.9%
t1602
 
7.5%
o1554
 
7.3%
r1347
 
6.3%
l877
 
4.1%
m811
 
3.8%
Other values (16)5394
25.2%
Uppercase Letter
ValueCountFrequency (%)
A789
17.1%
C668
14.5%
I541
11.7%
T482
10.5%
D312
 
6.8%
P311
 
6.8%
B182
 
4.0%
S164
 
3.6%
F162
 
3.5%
W149
 
3.2%
Other values (15)846
18.4%
Other Punctuation
ValueCountFrequency (%)
:351
62.5%
&81
 
14.4%
,42
 
7.5%
!32
 
5.7%
?30
 
5.3%
/17
 
3.0%
.4
 
0.7%
#3
 
0.5%
'2
 
0.4%
Decimal Number
ValueCountFrequency (%)
0844
42.1%
1572
28.5%
2552
27.5%
518
 
0.9%
716
 
0.8%
32
 
0.1%
62
 
0.1%
91
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
+275
85.1%
|48
 
14.9%
Space Separator
ValueCountFrequency (%)
3797
100.0%
Dash Punctuation
ValueCountFrequency (%)
-366
100.0%
Open Punctuation
ValueCountFrequency (%)
(129
100.0%
Close Punctuation
ValueCountFrequency (%)
)129
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin26031
78.1%
Common7314
 
21.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
n2165
 
8.3%
e2073
 
8.0%
i2059
 
7.9%
a1850
 
7.1%
s1693
 
6.5%
t1602
 
6.2%
o1554
 
6.0%
r1347
 
5.2%
l877
 
3.4%
m811
 
3.1%
Other values (41)10000
38.4%
Common
ValueCountFrequency (%)
3797
51.9%
0844
 
11.5%
1572
 
7.8%
2552
 
7.5%
-366
 
5.0%
:351
 
4.8%
+275
 
3.8%
(129
 
1.8%
)129
 
1.8%
&81
 
1.1%
Other values (14)218
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII33344
> 99.9%
Specials1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3797
 
11.4%
n2165
 
6.5%
e2073
 
6.2%
i2059
 
6.2%
a1850
 
5.5%
s1693
 
5.1%
t1602
 
4.8%
o1554
 
4.7%
r1347
 
4.0%
l877
 
2.6%
Other values (64)14327
43.0%
Specials
ValueCountFrequency (%)
1
100.0%

duration
Real number (ℝ≥0)

ZEROS

Distinct81
Distinct (%)10.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51.29875
Minimum0
Maximum1800
Zeros35
Zeros (%)4.4%
Negative0
Negative (%)0.0%
Memory size12.5 KiB
2021-05-18T16:07:19.457470image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q115
median36
Q369
95-th percentile92
Maximum1800
Range1800
Interquartile range (IQR)54

Descriptive statistics

Standard deviation104.3766022
Coefficient of variation (CV)2.0346812
Kurtosis186.7249165
Mean51.29875
Median Absolute Deviation (MAD)26
Skewness12.67349445
Sum41039
Variance10894.47509
MonotonicityNot monotonic
2021-05-18T16:07:19.569427image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1054
 
6.8%
1545
 
5.6%
3639
 
4.9%
035
 
4.4%
535
 
4.4%
4029
 
3.6%
8324
 
3.0%
6523
 
2.9%
2322
 
2.8%
6721
 
2.6%
Other values (71)473
59.1%
ValueCountFrequency (%)
035
4.4%
21
 
0.1%
314
 
1.8%
41
 
0.1%
535
4.4%
64
 
0.5%
72
 
0.2%
915
 
1.9%
1054
6.8%
114
 
0.5%
ValueCountFrequency (%)
18001
 
0.1%
15201
 
0.1%
14401
 
0.1%
6001
 
0.1%
4191
 
0.1%
4181
 
0.1%
2771
 
0.1%
2681
 
0.1%
1807
0.9%
1641
 
0.1%

learnerCom
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct81
Distinct (%)10.1%
Missing0
Missing (%)0.0%
Memory size12.5 KiB
12/3/2020
 
46
4/7/2020
 
41
12/22/2020
 
40
11/29/2020
 
39
12/27/2020
 
30
Other values (76)
604 

Length

Max length10
Median length9
Mean length9.0925
Min length8

Characters and Unicode

Total characters7274
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)1.5%

Sample

1st row12/3/2020
2nd row12/3/2020
3rd row12/3/2020
4th row5/24/2020
5th row5/24/2020

Common Values

ValueCountFrequency (%)
12/3/202046
 
5.8%
4/7/202041
 
5.1%
12/22/202040
 
5.0%
11/29/202039
 
4.9%
12/27/202030
 
3.8%
3/20/202028
 
3.5%
5/6/202027
 
3.4%
5/23/202026
 
3.2%
5/21/202024
 
3.0%
11/9/202021
 
2.6%
Other values (71)478
59.8%

Length

2021-05-18T16:07:19.813428image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
12/3/202046
 
5.8%
4/7/202041
 
5.1%
12/22/202040
 
5.0%
11/29/202039
 
4.9%
12/27/202030
 
3.8%
3/20/202028
 
3.5%
5/6/202027
 
3.4%
5/23/202026
 
3.2%
5/21/202024
 
3.0%
11/9/202021
 
2.6%
Other values (71)478
59.8%

Most occurring characters

ValueCountFrequency (%)
22313
31.8%
01672
23.0%
/1600
22.0%
1597
 
8.2%
5244
 
3.4%
4230
 
3.2%
3205
 
2.8%
7178
 
2.4%
9102
 
1.4%
678
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5674
78.0%
Other Punctuation1600
 
22.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
22313
40.8%
01672
29.5%
1597
 
10.5%
5244
 
4.3%
4230
 
4.1%
3205
 
3.6%
7178
 
3.1%
9102
 
1.8%
678
 
1.4%
855
 
1.0%
Other Punctuation
ValueCountFrequency (%)
/1600
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common7274
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
22313
31.8%
01672
23.0%
/1600
22.0%
1597
 
8.2%
5244
 
3.4%
4230
 
3.2%
3205
 
2.8%
7178
 
2.4%
9102
 
1.4%
678
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII7274
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
22313
31.8%
01672
23.0%
/1600
22.0%
1597
 
8.2%
5244
 
3.4%
4230
 
3.2%
3205
 
2.8%
7178
 
2.4%
9102
 
1.4%
678
 
1.1%

learnerIntranetID
Categorical

HIGH CORRELATION

Distinct24
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size12.5 KiB
munkimostra@gmail.com
101 
rajnish610@gmail.com
75 
shwetay629@gmail.com
72 
sagarsharma6970@gmail.com
57 
sanyapandey74@gmail.com
52 
Other values (19)
443 

Length

Max length31
Median length22
Mean length23.175
Min length18

Characters and Unicode

Total characters18540
Distinct characters35
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsimransanjay974@gmail.com
2nd rowsimransanjay974@gmail.com
3rd rowsimransanjay974@gmail.com
4th rowsimransanjay974@gmail.com
5th rowsimransanjay974@gmail.com

Common Values

ValueCountFrequency (%)
munkimostra@gmail.com101
12.6%
rajnish610@gmail.com75
 
9.4%
shwetay629@gmail.com72
 
9.0%
sagarsharma6970@gmail.com57
 
7.1%
sanyapandey74@gmail.com52
 
6.5%
sharmarup830@gmail.com48
 
6.0%
ajkumar1308@gmail.com46
 
5.8%
himanshugulati138@gmail.com44
 
5.5%
priyamagnihotri384@gmail.com43
 
5.4%
ap1077679@gmail.com37
 
4.6%
Other values (14)225
28.1%

Length

2021-05-18T16:07:20.023467image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
munkimostra@gmail.com101
12.6%
rajnish610@gmail.com75
 
9.4%
shwetay629@gmail.com72
 
9.0%
sagarsharma6970@gmail.com57
 
7.1%
sanyapandey74@gmail.com52
 
6.5%
sharmarup830@gmail.com48
 
6.0%
ajkumar1308@gmail.com46
 
5.8%
himanshugulati138@gmail.com44
 
5.5%
priyamagnihotri384@gmail.com43
 
5.4%
ap1077679@gmail.com37
 
4.6%
Other values (14)225
28.1%

Most occurring characters

ValueCountFrequency (%)
a2512
 
13.5%
m2218
 
12.0%
i1395
 
7.5%
o1007
 
5.4%
g972
 
5.2%
l866
 
4.7%
s811
 
4.4%
.811
 
4.4%
c802
 
4.3%
@800
 
4.3%
Other values (25)6346
34.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter14669
79.1%
Decimal Number2247
 
12.1%
Other Punctuation1611
 
8.7%
Connector Punctuation13
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a2512
17.1%
m2218
15.1%
i1395
9.5%
o1007
 
6.9%
g972
 
6.6%
l866
 
5.9%
s811
 
5.5%
c802
 
5.5%
h721
 
4.9%
r646
 
4.4%
Other values (12)2719
18.5%
Decimal Number
ValueCountFrequency (%)
0306
13.6%
7294
13.1%
1284
12.6%
9257
11.4%
3256
11.4%
6249
11.1%
8237
10.5%
2152
6.8%
4147
6.5%
565
 
2.9%
Other Punctuation
ValueCountFrequency (%)
.811
50.3%
@800
49.7%
Connector Punctuation
ValueCountFrequency (%)
_13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin14669
79.1%
Common3871
 
20.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a2512
17.1%
m2218
15.1%
i1395
9.5%
o1007
 
6.9%
g972
 
6.6%
l866
 
5.9%
s811
 
5.5%
c802
 
5.5%
h721
 
4.9%
r646
 
4.4%
Other values (12)2719
18.5%
Common
ValueCountFrequency (%)
.811
21.0%
@800
20.7%
0306
 
7.9%
7294
 
7.6%
1284
 
7.3%
9257
 
6.6%
3256
 
6.6%
6249
 
6.4%
8237
 
6.1%
2152
 
3.9%
Other values (3)225
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII18540
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a2512
 
13.5%
m2218
 
12.0%
i1395
 
7.5%
o1007
 
5.4%
g972
 
5.2%
l866
 
4.7%
s811
 
4.4%
.811
 
4.4%
c802
 
4.3%
@800
 
4.3%
Other values (25)6346
34.2%

Interactions

2021-05-18T16:07:18.534429image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-05-18T16:07:20.106432image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-05-18T16:07:20.211469image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-05-18T16:07:20.310428image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-05-18T16:07:20.426428image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-05-18T16:07:20.554463image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-05-18T16:07:18.768447image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-05-18T16:07:18.898429image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

learningActivityTitledurationlearnerComlearnerIntranetID
0Working with Data for Effective Decision Making2312/3/2020simransanjay974@gmail.com
1Personal Skills for Effective Business Analysis4012/3/2020simransanjay974@gmail.com
2Business Analysis Overview4312/3/2020simransanjay974@gmail.com
3Using Active Listening in Workplace Situations245/24/2020simransanjay974@gmail.com
4Clarity and Conciseness in Business Writing215/24/2020simransanjay974@gmail.com
5Audience and Purpose in Business Writing195/24/2020simransanjay974@gmail.com
6Effective Team Communication235/24/2020simransanjay974@gmail.com
7Communicating with impact105/22/2020simransanjay974@gmail.com
8Learning LinkedIn885/22/2020simransanjay974@gmail.com
9How To Use LinkedIn For Beginners - 7 LinkedIn Profile Tips95/22/2020simransanjay974@gmail.com

Last rows

learningActivityTitledurationlearnerComlearnerIntranetID
790Data Preprocessing2612/30/2020priyamagnihotri384@gmail.com
791Framing Opportunities for Effective Data-driven Decision Making2412/30/2020rajnish610@gmail.com
792Data Preprocessing2612/30/2020rajnish610@gmail.com
793Framing Opportunities for Effective Data-driven Decision Making2412/30/2020sharmarup830@gmail.com
794Data Preprocessing2612/30/2020sharmarup830@gmail.com
795Framing Opportunities for Effective Data-driven Decision Making2412/30/2020sagarsharma6970@gmail.com
796Data Preprocessing2612/30/2020sagarsharma6970@gmail.com
797Framing Opportunities for Effective Data-driven Decision Making2412/29/2020shwetay629@gmail.com
798Data Preprocessing2612/29/2020shwetay629@gmail.com
799Data Preprocessing2612/29/2020munkimostra@gmail.com