Pandas Profiling Report

Dataset statistics

Number of variables	4
Number of observations	800
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	31.2 KiB
Average record size in memory	40.0 B

Variable types

Categorical	3
Numeric	1

Warnings

`learningActivityTitle` has a high cardinality: 184 distinct values	High cardinality
`learnerCom` has a high cardinality: 81 distinct values	High cardinality
`learnerIntranetID` is highly correlated with `learnerCom`	High correlation
`learnerCom` is highly correlated with `learnerIntranetID`	High correlation
`duration` has 35 (4.4%) zeros	Zeros

Reproduction

Analysis started	2021-05-18 10:37:16.216429
Analysis finished	2021-05-18 10:37:18.985431
Duration	2.77 seconds
Software version	pandas-profiling v3.0.0
Download configuration	config.json

learningActivityTitle
Categorical

HIGH CARDINALITY

Distinct	184
Distinct (%)	23.0%
Missing	0
Missing (%)	0.0%
Memory size	12.5 KiB

CompTIA A+ 220-1001: Installing Hardware & Display Components	22
CompTIA A+ 220-1001: Basic Cable Types	21
CompTIA A+ 220-1001: Connectors	18
CompTIA A+ 220-1001: TCP & UDP ports	18
CompTIA A+ 220-1001: Implementing Network Concepts	17
Other values (179)	704

Length

Max length	103
Median length	41
Mean length	41.68125
Min length	10

Characters and Unicode

Total characters	33345
Distinct characters	75
Distinct categories	10 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	107 ?
Unique (%)	13.4%

Sample

1st row	Working with Data for Effective Decision Making
2nd row	Personal Skills for Effective Business Analysis
3rd row	Business Analysis Overview
4th row	Using Active Listening in Workplace Situations
5th row	Clarity and Conciseness in Business Writing

Common Values

Value	Count	Frequency (%)
CompTIA A+ 220-1001: Installing Hardware & Display Components	22	2.8%
CompTIA A+ 220-1001: Basic Cable Types	21	2.6%
CompTIA A+ 220-1001: Connectors	18	2.2%
CompTIA A+ 220-1001: TCP & UDP ports	18	2.2%
CompTIA A+ 220-1001: Implementing Network Concepts	17	2.1%
CompTIA A+ 220-1001: Resolving Problems	17	2.1%
CompTIA A+ 220-1001: Configuring a Wired/Wireless Network	17	2.1%
CompTIA A+ 220-1001: Printers	17	2.1%
CompTIA A+ 220-1001: Custom PC configuration	16	2.0%
CompTIA A+ 220-1001: Troubleshooting	16	2.0%
Other values (174)	621	77.6%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
a	331	7.2%
comptia	275	6.0%
220-1001	275	6.0%
data	207	4.5%
	175	3.8%
analysis	143	3.1%
fundamentals	121	2.6%
with	104	2.3%
for	80	1.7%
cybersecurity	66	1.4%
Other values (366)	2820	61.3%

Most occurring characters

Value	Count	Frequency (%)
	3797	11.4%
n	2165	6.5%
e	2073	6.2%
i	2059	6.2%
a	1850	5.5%
s	1693	5.1%
t	1602	4.8%
o	1554	4.7%
r	1347	4.0%
l	877	2.6%
Other values (65)	14328	43.0%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	21425	64.3%
Uppercase Letter	4606	13.8%
Space Separator	3797	11.4%
Decimal Number	2007	6.0%
Other Punctuation	562	1.7%
Dash Punctuation	366	1.1%
Math Symbol	323	1.0%
Open Punctuation	129	0.4%
Close Punctuation	129	0.4%
Other Symbol	1	< 0.1%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
n	2165	10.1%
e	2073	9.7%
i	2059	9.6%
a	1850	8.6%
s	1693	7.9%
t	1602	7.5%
o	1554	7.3%
r	1347	6.3%
l	877	4.1%
m	811	3.8%
Other values (16)	5394	25.2%

Uppercase Letter

Value	Count	Frequency (%)
A	789	17.1%
C	668	14.5%
I	541	11.7%
T	482	10.5%
D	312	6.8%
P	311	6.8%
B	182	4.0%
S	164	3.6%
F	162	3.5%
W	149	3.2%
Other values (15)	846	18.4%

Other Punctuation

Value	Count	Frequency (%)
:	351	62.5%
&	81	14.4%
,	42	7.5%
!	32	5.7%
?	30	5.3%
/	17	3.0%
.	4	0.7%
#	3	0.5%
'	2	0.4%

Decimal Number

Value	Count	Frequency (%)
0	844	42.1%
1	572	28.5%
2	552	27.5%
5	18	0.9%
7	16	0.8%
3	2	0.1%
6	2	0.1%
9	1	< 0.1%

Math Symbol

Value	Count	Frequency (%)
+	275	85.1%
\|	48	14.9%

Space Separator

Value	Count	Frequency (%)
	3797	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	366	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	129	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	129	100.0%

Other Symbol

Value	Count	Frequency (%)
�	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	26031	78.1%
Common	7314	21.9%

Most frequent character per script

Latin

Value	Count	Frequency (%)
n	2165	8.3%
e	2073	8.0%
i	2059	7.9%
a	1850	7.1%
s	1693	6.5%
t	1602	6.2%
o	1554	6.0%
r	1347	5.2%
l	877	3.4%
m	811	3.1%
Other values (41)	10000	38.4%

Common

Value	Count	Frequency (%)
	3797	51.9%
0	844	11.5%
1	572	7.8%
2	552	7.5%
-	366	5.0%
:	351	4.8%
+	275	3.8%
(	129	1.8%
)	129	1.8%
&	81	1.1%
Other values (14)	218	3.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	33344	> 99.9%
Specials	1	< 0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	3797	11.4%
n	2165	6.5%
e	2073	6.2%
i	2059	6.2%
a	1850	5.5%
s	1693	5.1%
t	1602	4.8%
o	1554	4.7%
r	1347	4.0%
l	877	2.6%
Other values (64)	14327	43.0%

Specials

Value	Count	Frequency (%)
�	1	100.0%

duration
Real number (ℝ_≥0)

ZEROS

Distinct	81
Distinct (%)	10.1%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	51.29875

Minimum	0
Maximum	1800
Zeros	35
Zeros (%)	4.4%
Negative	0
Negative (%)	0.0%
Memory size	12.5 KiB

Quantile statistics

Minimum	0
5-th percentile	3
Q1	15
median	36
Q3	69
95-th percentile	92
Maximum	1800
Range	1800
Interquartile range (IQR)	54

Descriptive statistics

Standard deviation	104.3766022
Coefficient of variation (CV)	2.0346812
Kurtosis	186.7249165
Mean	51.29875
Median Absolute Deviation (MAD)	26
Skewness	12.67349445
Sum	41039
Variance	10894.47509
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
10	54	6.8%
15	45	5.6%
36	39	4.9%
0	35	4.4%
5	35	4.4%
40	29	3.6%
83	24	3.0%
65	23	2.9%
23	22	2.8%
67	21	2.6%
Other values (71)	473	59.1%

Minimum 5 values
Maximum 5 values

Value	Count	Frequency (%)
0	35	4.4%
2	1	0.1%
3	14	1.8%
4	1	0.1%
5	35	4.4%
6	4	0.5%
7	2	0.2%
9	15	1.9%
10	54	6.8%
11	4	0.5%

Value	Count	Frequency (%)
1800	1	0.1%
1520	1	0.1%
1440	1	0.1%
600	1	0.1%
419	1	0.1%
418	1	0.1%
277	1	0.1%
268	1	0.1%
180	7	0.9%
164	1	0.1%

learnerCom
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct	81
Distinct (%)	10.1%
Missing	0
Missing (%)	0.0%
Memory size	12.5 KiB

12/3/2020	46
4/7/2020	41
12/22/2020	40
11/29/2020	39
12/27/2020	30
Other values (76)	604

Length

Max length	10
Median length	9
Mean length	9.0925
Min length	8

Characters and Unicode

Total characters	7274
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	12 ?
Unique (%)	1.5%

Sample

1st row	12/3/2020
2nd row	12/3/2020
3rd row	12/3/2020
4th row	5/24/2020
5th row	5/24/2020

Common Values

Value	Count	Frequency (%)
12/3/2020	46	5.8%
4/7/2020	41	5.1%
12/22/2020	40	5.0%
11/29/2020	39	4.9%
12/27/2020	30	3.8%
3/20/2020	28	3.5%
5/6/2020	27	3.4%
5/23/2020	26	3.2%
5/21/2020	24	3.0%
11/9/2020	21	2.6%
Other values (71)	478	59.8%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
12/3/2020	46	5.8%
4/7/2020	41	5.1%
12/22/2020	40	5.0%
11/29/2020	39	4.9%
12/27/2020	30	3.8%
3/20/2020	28	3.5%
5/6/2020	27	3.4%
5/23/2020	26	3.2%
5/21/2020	24	3.0%
11/9/2020	21	2.6%
Other values (71)	478	59.8%

Most occurring characters

Value	Count	Frequency (%)
2	2313	31.8%
0	1672	23.0%
/	1600	22.0%
1	597	8.2%
5	244	3.4%
4	230	3.2%
3	205	2.8%
7	178	2.4%
9	102	1.4%
6	78	1.1%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	5674	78.0%
Other Punctuation	1600	22.0%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
2	2313	40.8%
0	1672	29.5%
1	597	10.5%
5	244	4.3%
4	230	4.1%
3	205	3.6%
7	178	3.1%
9	102	1.8%
6	78	1.4%
8	55	1.0%

Other Punctuation

Value	Count	Frequency (%)
/	1600	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	7274	100.0%

Most frequent character per script

Common

Value	Count	Frequency (%)
2	2313	31.8%
0	1672	23.0%
/	1600	22.0%
1	597	8.2%
5	244	3.4%
4	230	3.2%
3	205	2.8%
7	178	2.4%
9	102	1.4%
6	78	1.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	7274	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
2	2313	31.8%
0	1672	23.0%
/	1600	22.0%
1	597	8.2%
5	244	3.4%
4	230	3.2%
3	205	2.8%
7	178	2.4%
9	102	1.4%
6	78	1.1%

learnerIntranetID
Categorical

HIGH CORRELATION

Distinct	24
Distinct (%)	3.0%
Missing	0
Missing (%)	0.0%
Memory size	12.5 KiB

munkimostra@gmail.com	101
rajnish610@gmail.com	75
shwetay629@gmail.com	72
sagarsharma6970@gmail.com	57
sanyapandey74@gmail.com	52
Other values (19)	443

Length

Max length	31
Median length	22
Mean length	23.175
Min length	18

Characters and Unicode

Total characters	18540
Distinct characters	35
Distinct categories	4 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	simransanjay974@gmail.com
2nd row	simransanjay974@gmail.com
3rd row	simransanjay974@gmail.com
4th row	simransanjay974@gmail.com
5th row	simransanjay974@gmail.com

Common Values

Value	Count	Frequency (%)
munkimostra@gmail.com	101	12.6%
rajnish610@gmail.com	75	9.4%
shwetay629@gmail.com	72	9.0%
sagarsharma6970@gmail.com	57	7.1%
sanyapandey74@gmail.com	52	6.5%
sharmarup830@gmail.com	48	6.0%
ajkumar1308@gmail.com	46	5.8%
himanshugulati138@gmail.com	44	5.5%
priyamagnihotri384@gmail.com	43	5.4%
ap1077679@gmail.com	37	4.6%
Other values (14)	225	28.1%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
munkimostra@gmail.com	101	12.6%
rajnish610@gmail.com	75	9.4%
shwetay629@gmail.com	72	9.0%
sagarsharma6970@gmail.com	57	7.1%
sanyapandey74@gmail.com	52	6.5%
sharmarup830@gmail.com	48	6.0%
ajkumar1308@gmail.com	46	5.8%
himanshugulati138@gmail.com	44	5.5%
priyamagnihotri384@gmail.com	43	5.4%
ap1077679@gmail.com	37	4.6%
Other values (14)	225	28.1%

Most occurring characters

Value	Count	Frequency (%)
a	2512	13.5%
m	2218	12.0%
i	1395	7.5%
o	1007	5.4%
g	972	5.2%
l	866	4.7%
s	811	4.4%
.	811	4.4%
c	802	4.3%
@	800	4.3%
Other values (25)	6346	34.2%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	14669	79.1%
Decimal Number	2247	12.1%
Other Punctuation	1611	8.7%
Connector Punctuation	13	0.1%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
a	2512	17.1%
m	2218	15.1%
i	1395	9.5%
o	1007	6.9%
g	972	6.6%
l	866	5.9%
s	811	5.5%
c	802	5.5%
h	721	4.9%
r	646	4.4%
Other values (12)	2719	18.5%

Decimal Number

Value	Count	Frequency (%)
0	306	13.6%
7	294	13.1%
1	284	12.6%
9	257	11.4%
3	256	11.4%
6	249	11.1%
8	237	10.5%
2	152	6.8%
4	147	6.5%
5	65	2.9%

Other Punctuation

Value	Count	Frequency (%)
.	811	50.3%
@	800	49.7%

Connector Punctuation

Value	Count	Frequency (%)
_	13	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	14669	79.1%
Common	3871	20.9%

Most frequent character per script

Latin

Value	Count	Frequency (%)
a	2512	17.1%
m	2218	15.1%
i	1395	9.5%
o	1007	6.9%
g	972	6.6%
l	866	5.9%
s	811	5.5%
c	802	5.5%
h	721	4.9%
r	646	4.4%
Other values (12)	2719	18.5%

Common

Value	Count	Frequency (%)
.	811	21.0%
@	800	20.7%
0	306	7.9%
7	294	7.6%
1	284	7.3%
9	257	6.6%
3	256	6.6%
6	249	6.4%
8	237	6.1%
2	152	3.9%
Other values (3)	225	5.8%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	18540	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
a	2512	13.5%
m	2218	12.0%
i	1395	7.5%
o	1007	5.4%
g	972	5.2%
l	866	4.7%
s	811	4.4%
.	811	4.4%
c	802	4.3%
@	800	4.3%
Other values (25)	6346	34.2%

duration

duration

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows

	learningActivityTitle	duration	learnerCom	learnerIntranetID
0	Working with Data for Effective Decision Making	23	12/3/2020	simransanjay974@gmail.com
1	Personal Skills for Effective Business Analysis	40	12/3/2020	simransanjay974@gmail.com
2	Business Analysis Overview	43	12/3/2020	simransanjay974@gmail.com
3	Using Active Listening in Workplace Situations	24	5/24/2020	simransanjay974@gmail.com
4	Clarity and Conciseness in Business Writing	21	5/24/2020	simransanjay974@gmail.com
5	Audience and Purpose in Business Writing	19	5/24/2020	simransanjay974@gmail.com
6	Effective Team Communication	23	5/24/2020	simransanjay974@gmail.com
7	Communicating with impact	10	5/22/2020	simransanjay974@gmail.com
8	Learning LinkedIn	88	5/22/2020	simransanjay974@gmail.com
9	How To Use LinkedIn For Beginners - 7 LinkedIn Profile Tips	9	5/22/2020	simransanjay974@gmail.com

Last rows

	learningActivityTitle	duration	learnerCom	learnerIntranetID
790	Data Preprocessing	26	12/30/2020	priyamagnihotri384@gmail.com
791	Framing Opportunities for Effective Data-driven Decision Making	24	12/30/2020	rajnish610@gmail.com
792	Data Preprocessing	26	12/30/2020	rajnish610@gmail.com
793	Framing Opportunities for Effective Data-driven Decision Making	24	12/30/2020	sharmarup830@gmail.com
794	Data Preprocessing	26	12/30/2020	sharmarup830@gmail.com
795	Framing Opportunities for Effective Data-driven Decision Making	24	12/30/2020	sagarsharma6970@gmail.com
796	Data Preprocessing	26	12/30/2020	sagarsharma6970@gmail.com
797	Framing Opportunities for Effective Data-driven Decision Making	24	12/29/2020	shwetay629@gmail.com
798	Data Preprocessing	26	12/29/2020	shwetay629@gmail.com
799	Data Preprocessing	26	12/29/2020	munkimostra@gmail.com

Overview

Variables

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Other Punctuation

Decimal Number

Math Symbol

Space Separator

Dash Punctuation

Open Punctuation

Close Punctuation

Other Symbol

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Specials

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Other Punctuation

Most occurring scripts

Most frequent character per script

Common

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Decimal Number

Other Punctuation

Connector Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Interactions

Correlations

Pearson's r

Spearman's ρ

Kendall's τ

Phik (φk)

Cramér's V (φc)

Missing values

Sample

First rows

Last rows