Data Processing and Interpretation
You might be wondering how the chart data for the Anti-Spam Resources is compiled. Where does it come from? What kind of senders does it represent? Here is a general overview how to read the chart data and introduction to our methods of classifying and processing data.
Data Interpretation
Every blacklist chart is showing the accumulated weekly statistics
for the last eight weeks. The "SPAM HITS" bar is a
percentage measurement derived from the number of spam-mails that a
particular blacklist correctly classified as spam. The "HAM
HITS" bar is a percentage measurement derived from the number
of ham-mails (non-spam) that a particular blacklist incorrectly
classified to be spam.
Example: Assuming blacklist "alpha" correctly tagged
70% of spam received (spam hits), and incorrectly tagged 0.1% of
non-spam mail (ham hits). Assuming blacklist "beta"
correctly tagged 90% of spam received (spam hits), and incorrectly
tagged 10% of non-spam mail (ham hits). You may conclude that
blacklist "beta" blocks more spam than blacklist "alpha",
however "beta" is incorrectly classifying
hundred times more desired mail as spam than "alpha".
Data Classification
Intra2net’s core database is fed by a cluster of reporting-servers located in Central Europe. We use the real mailstream of few selected Intra2net business customers - no spam traps, dead email addresses or similar methods are involved. All mails are automatically classified by mail-server subfolders location. No additional steps are required, just everyday user interaction.
UNDEFINED |
All mails located in top of "inbox" |
SPAM |
All mails located in subfolder "spam" |
HAM |
All mails located in any other subfolder |
Users are trained to move undetected spam mails from top of "inbox" to "spam" subfolder and to sort out the subfolder "spam suspect" occasionally. By "moving" mails to according subfolders instead of "deleting", a minimum change of behaviour is required by user. Deleted or collected (POP3) emails are ignored, same goes for backscatter. The automatic spamfilter is set to medium, not to hit ham-mails accidentally.
Data Processing
All incoming mails are duplicated to a dedicated reporting-server.
Thus all protocol information is preserved in order to run
network-based tests. All tests on the reporting-server are processed
completely independent from the main mail-server. In order to
classify mails we use unique IDs. Classification is done by user
interaction as described above. Reporting-servers only report final
test results to our core database server.
On average the core database is showing a volume of 90% spam and 10%
ham-mails, equal to the worldwide mailstream. However, we do have
a high bulk mail volume (mailing lists). Our experience has shown
that some network-based tests have problems differentiating
legitimate and unsolicited bulk mails. We do not count any IP-
addresses, just single mails. It makes no difference how many mails
we receive from the same IP-address. Keep in mind: We have no
influence on users accidentally classifying ham-mails as spam or
vice versa, nor do we have access to the content or body of customer mails
due to privacy reasons!
If you have any questions or comments about anything
here, about the Anti-Spam Resources, please don't hesitate
to contact us.
Visit the Blacklist Monitor mainpage for more blacklist statistics.