Our latest Libraries data release has arrived
19 Jan 2021 - 03 Dec 2024
Jan | MAY | Oct |
19:08:01 Jan 19, 2021 | 16 | 04:25:47 Oct 27, 2021 |
2020 | 2021 | 2024 |
success
fail Share via My Web Archive Sign InGet some help using the Wayback MachineClose the toolbar
screenshotvideoShare on FacebookShare on Twitter
COLLECTED BY
Collection: Open Syllabus
The Open Syllabus collection contains WARC files from a mid-2021 crawl of about 50 million unique seed URLs extracted from the Open Syllabus version 2.6 dataset and their page requisites. The bulk of the seed URLs are from “.com”, “.org”, “.edu”, and “.uk” TLDs.
Crawl Summary
- Crawl start: 2021-04-12
- Crawl end: 2021-09-05
- Seed URLs: 49,735,419
- Archived URLs: 338,690,414
- Collection Size: 25 TB
- Crawler: Heritrix/3.3.0-hq1-SNAPSHOT-2015-03-16T18:09:23Z
- Crawl depth: maxHops=0
Seed Summary
- Unique URLs: 49,735,419
- Unique Canonical URLs: 48,956,395
- Unique Hosts: 984,223
- IPv4 Addresses: 3,328
- Unique TLDs: 21,761
- Unique IANA Valid TLDs: 739
- Wayback Machine URLs*: 6,568,213
\* NOTE: More than 13% URLs in the dataset point to Wayback Machine!
TIMESTAMPS
The Wayback Machine - https://web.archive.org/web/20210516170833/https://blog.tidelift.com/our-latest-libraries-data-release-has-arrived
Do you develop apps with open source? Join us June 7 for Upstream.
Our latest Libraries data release has arrived
by Jeremy Katz
on January 24, 2019
As part of our ongoing work on Libraries.io, we are glad to announce the availability of an updated data set. The new data set captures the state of open source metadata and the graph of dependencies as of the end of 2018. This data is available today as a set of CSV files that you can analyze.
Today’s data set includes information on over 16 million versions of 3.3 million open source packages. These packages are being tracked from 37 different package managers as well as information about repositories on GitHub, GitLab, and Bitbucket.
Analyzing the data using a data analytics tool like Google’s BigQuery allows you to look up and find things such as:
-
There are almost twice as many packages released on any given weekday compared to any given weekend day.
-
Despite the default license for npm modules created with `npm init` being ISC, there are more than twice as many MIT licensed npm modules as ISC.
-
Only 2.1% of all dependencies used by npm packages are pinned to the most recent release.
More documentation on the structure of the data can be found on the release page. Note that the data is available under a Creative Commons BY-SA-4.0 license. We would love to see and hear about any interesting things that you find in the data. Let us know by tagging @librariesio on Twitter.
Libraries.io, Dependencies
You might also like:
Libraries.io, Dependencies
The state of package signing across package managers