North American Network Operators Group
Date Prev | Date Next |
Date Index |
Thread Index |
Author Index |
Historical
Re: How to get a list of research and academic ISP ?
- From: Tom Vest
- Date: Tue Nov 21 04:05:45 2006
You might have a look at:
http://www.caida.org/publications/papers/2006/revealingas/
revealingas.pdf
The algorithm produces a lot of false negatives for non-English
speaking countries that don't use .edu uniformly, but is otherwise an
excellent place to start...
TV
On Nov 20, 2006, at 3:59 PM, Marshall Eubanks wrote:
Hello;
On Nov 20, 2006, at 3:13 PM, Maciej Kurant wrote:
Dear All,
Thank you very much for numerous and quick replies for my email. I
must say that nanog list is really highly responsive.
I needed some time to digest your comments and try some new ideas.
I share the preliminary results with you now, begging for further
comments.
The problem was (and still is) to find a good heuristic to
distinguish between commercial (COM) and educational/research/
academic (EDU) ASes.
I would suggest you need to think a little about what exactly you want
- a list of _all_ academic ASN ? (that will be tough, and you will
have to deal with corner cases, and you will not fully automate it)
- a list of _some_ academic ASN ? (you have that now - so are you
worried about completeness or size or ... ?)
- a list of _no_ academic ASN ? (again, this will be tough)
or something else ?
Note, too, that these lists will change with time.
*EDU_Abilene*
My first approach (see my original email) was to extract a list of
all destinations announced by Abilene. (The assumption is that
Abilene generally does not announce commercial prefixes.) This
results in a list, call it “EDU_Abilene”, of 1333 ASes.
*EDU_description*
Some of you suggested looking at the names and descriptions of
ASes. I used the AS list available at:
http://www.multicasttech.com/status/asn_expand.txt
and searched the last column ("Organization") for the following
strings:
"Universit|Univerz|Universida|research|education|science|scientif|
academic|college|institut|laborator|school|ecole|
edu|R&D|library|academy|Etudes"
This approach finds 1796 "educational" ASes, call this set
“EDU_description”.
Of course, these two lists overlap, but less than I expected. In
particular:
len(EDU_Abilene)=1333
len(EDU_description)=1796
union(EDU_Abilene, EDU_description)=2269
intersection(EDU_Abilene, EDU_description)=860
For many reasons, these lists are far from being very precise. For
instance EDU_Abilene contains AS 7132 (AT&T) and AS 8075
(Microsoft). Therefore I need further data sets or filtering
methodology. This raises some questions:
1) What other EDU networks (preferably with BGP tables available
in the web) can I take as examples of ASes that (generally) do not
announce commercial prefixes? Based on them I could construct
lists similar in spirit to EDU_Abilene. I guess, the more the better.
There are lots - look at the ones that Abilene peers with
http://international.internet2.edu/partners/
http://abilene.internet2.edu/peernetworks/international.html
2) Do you know of other lists, similar to http://
www.multicasttech.com/status/asn_expand.txt ? Maybe a longer
description or a www related to an AS would help the method I use
to create EDU_description. Do you think the strings I use in my
search are appropriate?
Try
http://bgp.potaroo.net/as1221/asnames.txt
Note that there are errors all over the place here; these lists
will not agree perfectly.
My lists come from the rwhois data, but I correct for obvious
errors (some of which I have
sent back to the list maintainers). There are others I am sure that
I have not caught, and my corrections are undoubtedly not perfect.
I am
sure that the other maintainers of such lists could tell similar
tales.
You could start polling rwhois yourself, and I would in doubtful
cases.
*AS relationships*
Another approach is to exploit the AS relationships. Most of you
agree that usually EDU ASes are not providers for COM customers.
This suggests a way to detect false positives in EDU_Abilene and
EDU_description (or in their union). For every EDU node check how
many COM customers it has, i.e., EDU provider --- COM customer
relationship. I used the AS graphs with inferred relationships
provided by CAIDA (http://as-rank.caida.org/data/2006/). This
method works well to find good candidates for false positive, but
they should not be blindly accepted. For instance AS 7132 (AT&T)
has the highest number of COM customers (615) and should obviously
belong to COM (it is a member of EDU_Abilene). In contrast, a big
component of the EDU backbone, AS 11537 (Abilene) has 66 COM
customers! In general there are about 50 EDU nodes with more than
10 COM customers each.
Not a bad approach.
3) What other “automatic” or “manual” approaches would you
suggest? Or improvements of the ones just described?
Again, I don't know what you are trying to do. What I have found
useful is what you are doing - make lots of lists, and cross
reference, and
see what passes multiple tests.
I will appreciate even the briefest comments and suggestions,
Maciej Kurant
Hope this helps.
Regards
Marshall
From: Maciej Kurant [mailto:[email protected]]
Sent: mercredi, 15. novembre 2006 18:46
To: '[email protected]'
Subject: How to get a list of research and academic ISP ?
Dear all,
I am a PhD student at EPFL, Switzerland. My recent research
interest is in large scale differences between the commercial and
academic parts of the Internet.
Of course, in order to perform this kind of studies I need a way
to distinguish between these two worlds. I’ve learnt that Abilene
does not provide commercial connectivity. This means that BGP
prefixes and AS paths announced by Abilene BGP routers should lead
only to research and academic destinations. I have extracted (from
the BGP tables at http://abilene.internet2.edu/observatory) a list
of all such destinations and obtained 1333 ASes (for data form
July 2006). The number looks reasonable, but I would like to be
sure that I am not making a mistake. Therefore I would be grateful
if you could answer the following questions:
1) Is this approach to obtain a list of research and
academic ISPs correct?
2) Do you maybe know of such lists compiled before?
3) If I keep not only the destination ASes, but also all
ASes on the AS paths towards these destination I obtain a list of
about 1400 ASes. How should I understand this? Does it mean that
some research and academic destinations are reachable from Abilene
only by traversing the commercial Internet?
4) Of course, research and academic ASes are often well
connected to the commercial Internet. My guess is that in most
cases their peering relationship is “customer-provider”, where
commercial ASes are providers. Is it possible that an academic AS
is a provider for some commercial ASes? If so, does it happen often?
Thank you in advance for your comments.
Maciej Kurant
=============================================
EPFL IC ISC LCA3
Maciej Kurant
PhD Student
CH-1015 Lausanne, Switzerland
web site: http://lcawww.epfl.ch/kurant
=============================================
|