rio - A Swiss-Army Knife for Data I/O
Streamlined data import and export by making assumptions that the user is probably willing to make: 'import()' and 'export()' determine the data format from the file extension, reasonable defaults are used for data import and export, web-based import is natively supported (including from SSL/HTTPS), compressed files can be read directly, and fast import packages are used where appropriate. An additional convenience function, 'convert()', provides a simple method for converting between file types.
Last updated 1 months ago
csvcsvydatadata-scienceexcelioriosasspssstata
16.88 score 606 stars 66 dependents 7.5k scripts 31k downloadsrtoot - Collecting and Analyzing Mastodon Data
An implementation of calls designed to collect and organize Mastodon data via its Application Program Interfaces (API), which can be found at the following URL: <https://docs.joinmastodon.org/>.
Last updated 8 days ago
mastodonmastodon-api
8.58 score 105 stars 67 scripts 825 downloadsoolong - Create Validation Tests for Automated Content Analysis
Intended to create standard human-in-the-loop validity tests for typical automated content analysis such as topic modeling and dictionary-based methods. This package offers a standard workflow with functions to prepare, administer and evaluate a human-in-the-loop validity test. This package provides functions for validating topic models using word intrusion, topic intrusion (Chang et al. 2009, <https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models>) and word set intrusion (Ying et al. 2021) <doi:10.1017/pan.2021.33> tests. This package also provides functions for generating gold-standard data which are useful for validating dictionary-based methods. The default settings of all generated tests match those suggested in Chang et al. (2009) and Song et al. (2020) <doi:10.1080/10584609.2020.1723752>.
Last updated 23 days ago
textanalysistopicmodelingvalidation
7.55 score 54 stars 22 scripts 232 downloadsminty - Minimal Type Guesser
Port the type guesser from 'readr' (so-called 'readr' first edition parsing engine, now superseded by 'vroom').
Last updated 18 days ago
cpp
6.97 score 5 stars 23 dependents 5 scripts 6.8k downloadsadaR - A Fast 'WHATWG' Compliant URL Parser
A wrapper for 'ada-url', a 'WHATWG' compliant and fast URL parser written in modern 'C++'. Also contains auxiliary functions such as a public suffix extractor.
Last updated 7 days ago
url-parsercpp
6.84 score 26 stars 2 dependents 11 scripts 307 downloadsrang - Reconstructing Reproducible R Computational Environments
Resolve the dependency graph of R packages at a specific time point based on the information from various 'R-hub' web services <https://blog.r-hub.io/>. The dependency graph can then be used to reconstruct the R computational environment with 'Rocker' <https://rocker-project.org>.
Last updated 1 years ago
reproducibilityreproducible-research
6.31 score 79 stars 13 scripts 223 downloadswebtrackR - Preprocessing and Analyzing Web Tracking Data
Data structures and methods to work with web tracking data. The functions cover data preprocessing steps, enriching web tracking data with external information and methods for the analysis of digital behavior as used in several academic papers (e.g., Clemm von Hohenberg et al., 2023 <doi:10.17605/OSF.IO/M3U9P>; Stier et al., 2022 <doi:10.1017/S0003055421001222>).
Last updated 1 months ago
webtracking
6.08 score 9 stars 8 scripts 548 downloadsgrafzahl - Supervised Machine Learning for Textual Data Using Transformers and 'Quanteda'
Duct tape the 'quanteda' ecosystem (Benoit et al., 2018) <doi:10.21105/joss.00774> to modern Transformer-based text classification models (Wolf et al., 2020) <doi:10.18653/v1/2020.emnlp-demos.6>, in order to facilitate supervised machine learning for textual data. This package mimics the behaviors of 'quanteda.textmodels' and provides a function to setup the 'Python' environment to use the pretrained models from 'Hugging Face' <https://huggingface.co/>. More information: <doi:10.5117/CCR2023.1.003.CHAN>.
Last updated 1 months ago
5.79 score 41 stars 3 scripts 200 downloadssweater - Speedy Word Embedding Association Test and Extras Using R
Conduct various tests for evaluating implicit biases in word embeddings: Word Embedding Association Test (Caliskan et al., 2017), <doi:10.1126/science.aal4230>, Relative Norm Distance (Garg et al., 2018), <doi:10.1073/pnas.1720347115>, Mean Average Cosine Similarity (Mazini et al., 2019) <arXiv:1904.04047>, SemAxis (An et al., 2018) <arXiv:1806.05521>, Relative Negative Sentiment Bias (Sweeney & Najafian, 2019) <doi:10.18653/v1/P19-1162>, and Embedding Coherence Test (Dev & Phillips, 2019) <arXiv:1901.07656>.
Last updated 1 months ago
bias-detectiontextanalysiswordembeddingcpp
4.59 score 28 stars 14 scripts 431 downloadswebbotparseR - Parse html files containing search engine results
Parse search engine results which have been scraped with the 'WebBot' browser extension <https://github.com/gesiscss/WebBot>.
Last updated 2 months ago
browser-extensionsearch-engine
3.38 score 8 stars 6 scripts