Dataset statistics
Number of variables | 4 |
---|---|
Number of observations | 800 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 31.2 KiB |
Average record size in memory | 40.0 B |
Variable types
Categorical | 3 |
---|---|
Numeric | 1 |
learningActivityTitle has a high cardinality: 184 distinct values | High cardinality |
learnerCom has a high cardinality: 81 distinct values | High cardinality |
learnerIntranetID is highly correlated with learnerCom | High correlation |
learnerCom is highly correlated with learnerIntranetID | High correlation |
duration has 35 (4.4%) zeros | Zeros |
Reproduction
Analysis started | 2021-05-18 10:37:16.216429 |
---|---|
Analysis finished | 2021-05-18 10:37:18.985431 |
Duration | 2.77 seconds |
Software version | pandas-profiling v3.0.0 |
Download configuration | config.json |
Distinct | 184 |
---|---|
Distinct (%) | 23.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 12.5 KiB |
CompTIA A+ 220-1001: Installing Hardware & Display Components | 22 |
---|---|
CompTIA A+ 220-1001: Basic Cable Types | 21 |
CompTIA A+ 220-1001: Connectors | 18 |
CompTIA A+ 220-1001: TCP & UDP ports | 18 |
CompTIA A+ 220-1001: Implementing Network Concepts | 17 |
Other values (179) |
Length
Max length | 103 |
---|---|
Median length | 41 |
Mean length | 41.68125 |
Min length | 10 |
Characters and Unicode
Total characters | 33345 |
---|---|
Distinct characters | 75 |
Distinct categories | 10 ? |
Distinct scripts | 2 ? |
Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 107 ? |
---|---|
Unique (%) | 13.4% |
Sample
1st row | Working with Data for Effective Decision Making |
---|---|
2nd row | Personal Skills for Effective Business Analysis |
3rd row | Business Analysis Overview |
4th row | Using Active Listening in Workplace Situations |
5th row | Clarity and Conciseness in Business Writing |
Common Values
Value | Count | Frequency (%) |
CompTIA A+ 220-1001: Installing Hardware & Display Components | 22 | 2.8% |
CompTIA A+ 220-1001: Basic Cable Types | 21 | 2.6% |
CompTIA A+ 220-1001: Connectors | 18 | 2.2% |
CompTIA A+ 220-1001: TCP & UDP ports | 18 | 2.2% |
CompTIA A+ 220-1001: Implementing Network Concepts | 17 | 2.1% |
CompTIA A+ 220-1001: Resolving Problems | 17 | 2.1% |
CompTIA A+ 220-1001: Configuring a Wired/Wireless Network | 17 | 2.1% |
CompTIA A+ 220-1001: Printers | 17 | 2.1% |
CompTIA A+ 220-1001: Custom PC configuration | 16 | 2.0% |
CompTIA A+ 220-1001: Troubleshooting | 16 | 2.0% |
Other values (174) | 621 |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
a | 331 | 7.2% |
comptia | 275 | 6.0% |
220-1001 | 275 | 6.0% |
data | 207 | 4.5% |
175 | 3.8% | |
analysis | 143 | 3.1% |
fundamentals | 121 | 2.6% |
with | 104 | 2.3% |
for | 80 | 1.7% |
cybersecurity | 66 | 1.4% |
Other values (366) | 2820 |
Most occurring characters
Value | Count | Frequency (%) |
3797 | 11.4% | |
n | 2165 | 6.5% |
e | 2073 | 6.2% |
i | 2059 | 6.2% |
a | 1850 | 5.5% |
s | 1693 | 5.1% |
t | 1602 | 4.8% |
o | 1554 | 4.7% |
r | 1347 | 4.0% |
l | 877 | 2.6% |
Other values (65) | 14328 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 21425 | |
Uppercase Letter | 4606 | 13.8% |
Space Separator | 3797 | 11.4% |
Decimal Number | 2007 | 6.0% |
Other Punctuation | 562 | 1.7% |
Dash Punctuation | 366 | 1.1% |
Math Symbol | 323 | 1.0% |
Open Punctuation | 129 | 0.4% |
Close Punctuation | 129 | 0.4% |
Other Symbol | 1 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
n | 2165 | |
e | 2073 | 9.7% |
i | 2059 | 9.6% |
a | 1850 | 8.6% |
s | 1693 | 7.9% |
t | 1602 | 7.5% |
o | 1554 | 7.3% |
r | 1347 | 6.3% |
l | 877 | 4.1% |
m | 811 | 3.8% |
Other values (16) | 5394 |
Uppercase Letter
Value | Count | Frequency (%) |
A | 789 | |
C | 668 | |
I | 541 | |
T | 482 | |
D | 312 | 6.8% |
P | 311 | 6.8% |
B | 182 | 4.0% |
S | 164 | 3.6% |
F | 162 | 3.5% |
W | 149 | 3.2% |
Other values (15) | 846 |
Other Punctuation
Value | Count | Frequency (%) |
: | 351 | |
& | 81 | 14.4% |
, | 42 | 7.5% |
! | 32 | 5.7% |
? | 30 | 5.3% |
/ | 17 | 3.0% |
. | 4 | 0.7% |
# | 3 | 0.5% |
' | 2 | 0.4% |
Decimal Number
Value | Count | Frequency (%) |
0 | 844 | |
1 | 572 | |
2 | 552 | |
5 | 18 | 0.9% |
7 | 16 | 0.8% |
3 | 2 | 0.1% |
6 | 2 | 0.1% |
9 | 1 | < 0.1% |
Math Symbol
Value | Count | Frequency (%) |
+ | 275 | |
| | 48 | 14.9% |
Space Separator
Value | Count | Frequency (%) |
3797 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 366 |
Open Punctuation
Value | Count | Frequency (%) |
( | 129 |
Close Punctuation
Value | Count | Frequency (%) |
) | 129 |
Other Symbol
Value | Count | Frequency (%) |
� | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 26031 | |
Common | 7314 | 21.9% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
n | 2165 | 8.3% |
e | 2073 | 8.0% |
i | 2059 | 7.9% |
a | 1850 | 7.1% |
s | 1693 | 6.5% |
t | 1602 | 6.2% |
o | 1554 | 6.0% |
r | 1347 | 5.2% |
l | 877 | 3.4% |
m | 811 | 3.1% |
Other values (41) | 10000 |
Common
Value | Count | Frequency (%) |
3797 | ||
0 | 844 | 11.5% |
1 | 572 | 7.8% |
2 | 552 | 7.5% |
- | 366 | 5.0% |
: | 351 | 4.8% |
+ | 275 | 3.8% |
( | 129 | 1.8% |
) | 129 | 1.8% |
& | 81 | 1.1% |
Other values (14) | 218 | 3.0% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 33344 | |
Specials | 1 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
3797 | 11.4% | |
n | 2165 | 6.5% |
e | 2073 | 6.2% |
i | 2059 | 6.2% |
a | 1850 | 5.5% |
s | 1693 | 5.1% |
t | 1602 | 4.8% |
o | 1554 | 4.7% |
r | 1347 | 4.0% |
l | 877 | 2.6% |
Other values (64) | 14327 |
Specials
Value | Count | Frequency (%) |
� | 1 |
Distinct | 81 |
---|---|
Distinct (%) | 10.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 51.29875 |
Minimum | 0 |
---|---|
Maximum | 1800 |
Zeros | 35 |
Zeros (%) | 4.4% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 12.5 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 3 |
Q1 | 15 |
median | 36 |
Q3 | 69 |
95-th percentile | 92 |
Maximum | 1800 |
Range | 1800 |
Interquartile range (IQR) | 54 |
Descriptive statistics
Standard deviation | 104.3766022 |
---|---|
Coefficient of variation (CV) | 2.0346812 |
Kurtosis | 186.7249165 |
Mean | 51.29875 |
Median Absolute Deviation (MAD) | 26 |
Skewness | 12.67349445 |
Sum | 41039 |
Variance | 10894.47509 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
10 | 54 | 6.8% |
15 | 45 | 5.6% |
36 | 39 | 4.9% |
0 | 35 | 4.4% |
5 | 35 | 4.4% |
40 | 29 | 3.6% |
83 | 24 | 3.0% |
65 | 23 | 2.9% |
23 | 22 | 2.8% |
67 | 21 | 2.6% |
Other values (71) | 473 |
Value | Count | Frequency (%) |
0 | 35 | |
2 | 1 | 0.1% |
3 | 14 | 1.8% |
4 | 1 | 0.1% |
5 | 35 | |
6 | 4 | 0.5% |
7 | 2 | 0.2% |
9 | 15 | 1.9% |
10 | 54 | |
11 | 4 | 0.5% |
Value | Count | Frequency (%) |
1800 | 1 | 0.1% |
1520 | 1 | 0.1% |
1440 | 1 | 0.1% |
600 | 1 | 0.1% |
419 | 1 | 0.1% |
418 | 1 | 0.1% |
277 | 1 | 0.1% |
268 | 1 | 0.1% |
180 | 7 | |
164 | 1 | 0.1% |
Distinct | 81 |
---|---|
Distinct (%) | 10.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 12.5 KiB |
12/3/2020 | 46 |
---|---|
4/7/2020 | 41 |
12/22/2020 | 40 |
11/29/2020 | 39 |
12/27/2020 | 30 |
Other values (76) |
Length
Max length | 10 |
---|---|
Median length | 9 |
Mean length | 9.0925 |
Min length | 8 |
Characters and Unicode
Total characters | 7274 |
---|---|
Distinct characters | 11 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 12 ? |
---|---|
Unique (%) | 1.5% |
Sample
1st row | 12/3/2020 |
---|---|
2nd row | 12/3/2020 |
3rd row | 12/3/2020 |
4th row | 5/24/2020 |
5th row | 5/24/2020 |
Common Values
Value | Count | Frequency (%) |
12/3/2020 | 46 | 5.8% |
4/7/2020 | 41 | 5.1% |
12/22/2020 | 40 | 5.0% |
11/29/2020 | 39 | 4.9% |
12/27/2020 | 30 | 3.8% |
3/20/2020 | 28 | 3.5% |
5/6/2020 | 27 | 3.4% |
5/23/2020 | 26 | 3.2% |
5/21/2020 | 24 | 3.0% |
11/9/2020 | 21 | 2.6% |
Other values (71) | 478 |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
12/3/2020 | 46 | 5.8% |
4/7/2020 | 41 | 5.1% |
12/22/2020 | 40 | 5.0% |
11/29/2020 | 39 | 4.9% |
12/27/2020 | 30 | 3.8% |
3/20/2020 | 28 | 3.5% |
5/6/2020 | 27 | 3.4% |
5/23/2020 | 26 | 3.2% |
5/21/2020 | 24 | 3.0% |
11/9/2020 | 21 | 2.6% |
Other values (71) | 478 |
Most occurring characters
Value | Count | Frequency (%) |
2 | 2313 | |
0 | 1672 | |
/ | 1600 | |
1 | 597 | 8.2% |
5 | 244 | 3.4% |
4 | 230 | 3.2% |
3 | 205 | 2.8% |
7 | 178 | 2.4% |
9 | 102 | 1.4% |
6 | 78 | 1.1% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 5674 | |
Other Punctuation | 1600 | 22.0% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
2 | 2313 | |
0 | 1672 | |
1 | 597 | 10.5% |
5 | 244 | 4.3% |
4 | 230 | 4.1% |
3 | 205 | 3.6% |
7 | 178 | 3.1% |
9 | 102 | 1.8% |
6 | 78 | 1.4% |
8 | 55 | 1.0% |
Other Punctuation
Value | Count | Frequency (%) |
/ | 1600 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 7274 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
2 | 2313 | |
0 | 1672 | |
/ | 1600 | |
1 | 597 | 8.2% |
5 | 244 | 3.4% |
4 | 230 | 3.2% |
3 | 205 | 2.8% |
7 | 178 | 2.4% |
9 | 102 | 1.4% |
6 | 78 | 1.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 7274 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
2 | 2313 | |
0 | 1672 | |
/ | 1600 | |
1 | 597 | 8.2% |
5 | 244 | 3.4% |
4 | 230 | 3.2% |
3 | 205 | 2.8% |
7 | 178 | 2.4% |
9 | 102 | 1.4% |
6 | 78 | 1.1% |
Distinct | 24 |
---|---|
Distinct (%) | 3.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 12.5 KiB |
munkimostra@gmail.com | |
---|---|
rajnish610@gmail.com | |
shwetay629@gmail.com | |
sagarsharma6970@gmail.com | |
sanyapandey74@gmail.com | |
Other values (19) |
Length
Max length | 31 |
---|---|
Median length | 22 |
Mean length | 23.175 |
Min length | 18 |
Characters and Unicode
Total characters | 18540 |
---|---|
Distinct characters | 35 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | simransanjay974@gmail.com |
---|---|
2nd row | simransanjay974@gmail.com |
3rd row | simransanjay974@gmail.com |
4th row | simransanjay974@gmail.com |
5th row | simransanjay974@gmail.com |
Common Values
Value | Count | Frequency (%) |
munkimostra@gmail.com | 101 | |
rajnish610@gmail.com | 75 | 9.4% |
shwetay629@gmail.com | 72 | 9.0% |
sagarsharma6970@gmail.com | 57 | 7.1% |
sanyapandey74@gmail.com | 52 | 6.5% |
sharmarup830@gmail.com | 48 | 6.0% |
ajkumar1308@gmail.com | 46 | 5.8% |
himanshugulati138@gmail.com | 44 | 5.5% |
priyamagnihotri384@gmail.com | 43 | 5.4% |
ap1077679@gmail.com | 37 | 4.6% |
Other values (14) | 225 |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
munkimostra@gmail.com | 101 | |
rajnish610@gmail.com | 75 | 9.4% |
shwetay629@gmail.com | 72 | 9.0% |
sagarsharma6970@gmail.com | 57 | 7.1% |
sanyapandey74@gmail.com | 52 | 6.5% |
sharmarup830@gmail.com | 48 | 6.0% |
ajkumar1308@gmail.com | 46 | 5.8% |
himanshugulati138@gmail.com | 44 | 5.5% |
priyamagnihotri384@gmail.com | 43 | 5.4% |
ap1077679@gmail.com | 37 | 4.6% |
Other values (14) | 225 |
Most occurring characters
Value | Count | Frequency (%) |
a | 2512 | 13.5% |
m | 2218 | 12.0% |
i | 1395 | 7.5% |
o | 1007 | 5.4% |
g | 972 | 5.2% |
l | 866 | 4.7% |
s | 811 | 4.4% |
. | 811 | 4.4% |
c | 802 | 4.3% |
@ | 800 | 4.3% |
Other values (25) | 6346 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 14669 | |
Decimal Number | 2247 | 12.1% |
Other Punctuation | 1611 | 8.7% |
Connector Punctuation | 13 | 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 2512 | |
m | 2218 | |
i | 1395 | |
o | 1007 | 6.9% |
g | 972 | 6.6% |
l | 866 | 5.9% |
s | 811 | 5.5% |
c | 802 | 5.5% |
h | 721 | 4.9% |
r | 646 | 4.4% |
Other values (12) | 2719 |
Decimal Number
Value | Count | Frequency (%) |
0 | 306 | |
7 | 294 | |
1 | 284 | |
9 | 257 | |
3 | 256 | |
6 | 249 | |
8 | 237 | |
2 | 152 | |
4 | 147 | |
5 | 65 | 2.9% |
Other Punctuation
Value | Count | Frequency (%) |
. | 811 | |
@ | 800 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 13 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 14669 | |
Common | 3871 | 20.9% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 2512 | |
m | 2218 | |
i | 1395 | |
o | 1007 | 6.9% |
g | 972 | 6.6% |
l | 866 | 5.9% |
s | 811 | 5.5% |
c | 802 | 5.5% |
h | 721 | 4.9% |
r | 646 | 4.4% |
Other values (12) | 2719 |
Common
Value | Count | Frequency (%) |
. | 811 | |
@ | 800 | |
0 | 306 | 7.9% |
7 | 294 | 7.6% |
1 | 284 | 7.3% |
9 | 257 | 6.6% |
3 | 256 | 6.6% |
6 | 249 | 6.4% |
8 | 237 | 6.1% |
2 | 152 | 3.9% |
Other values (3) | 225 | 5.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 18540 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
a | 2512 | 13.5% |
m | 2218 | 12.0% |
i | 1395 | 7.5% |
o | 1007 | 5.4% |
g | 972 | 5.2% |
l | 866 | 4.7% |
s | 811 | 4.4% |
. | 811 | 4.4% |
c | 802 | 4.3% |
@ | 800 | 4.3% |
Other values (25) | 6346 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
learningActivityTitle | duration | learnerCom | learnerIntranetID | |
---|---|---|---|---|
0 | Working with Data for Effective Decision Making | 23 | 12/3/2020 | simransanjay974@gmail.com |
1 | Personal Skills for Effective Business Analysis | 40 | 12/3/2020 | simransanjay974@gmail.com |
2 | Business Analysis Overview | 43 | 12/3/2020 | simransanjay974@gmail.com |
3 | Using Active Listening in Workplace Situations | 24 | 5/24/2020 | simransanjay974@gmail.com |
4 | Clarity and Conciseness in Business Writing | 21 | 5/24/2020 | simransanjay974@gmail.com |
5 | Audience and Purpose in Business Writing | 19 | 5/24/2020 | simransanjay974@gmail.com |
6 | Effective Team Communication | 23 | 5/24/2020 | simransanjay974@gmail.com |
7 | Communicating with impact | 10 | 5/22/2020 | simransanjay974@gmail.com |
8 | Learning LinkedIn | 88 | 5/22/2020 | simransanjay974@gmail.com |
9 | How To Use LinkedIn For Beginners - 7 LinkedIn Profile Tips | 9 | 5/22/2020 | simransanjay974@gmail.com |
Last rows
learningActivityTitle | duration | learnerCom | learnerIntranetID | |
---|---|---|---|---|
790 | Data Preprocessing | 26 | 12/30/2020 | priyamagnihotri384@gmail.com |
791 | Framing Opportunities for Effective Data-driven Decision Making | 24 | 12/30/2020 | rajnish610@gmail.com |
792 | Data Preprocessing | 26 | 12/30/2020 | rajnish610@gmail.com |
793 | Framing Opportunities for Effective Data-driven Decision Making | 24 | 12/30/2020 | sharmarup830@gmail.com |
794 | Data Preprocessing | 26 | 12/30/2020 | sharmarup830@gmail.com |
795 | Framing Opportunities for Effective Data-driven Decision Making | 24 | 12/30/2020 | sagarsharma6970@gmail.com |
796 | Data Preprocessing | 26 | 12/30/2020 | sagarsharma6970@gmail.com |
797 | Framing Opportunities for Effective Data-driven Decision Making | 24 | 12/29/2020 | shwetay629@gmail.com |
798 | Data Preprocessing | 26 | 12/29/2020 | shwetay629@gmail.com |
799 | Data Preprocessing | 26 | 12/29/2020 | munkimostra@gmail.com |